More than 3 years have passed since last update.

正規化の方法(エンコード)と戻し方(デコード)

Last updated at 2020-08-17Posted at 2020-08-17

0. 概要

機械学習においてデータセットの前処理として最も使われているといわれても過言ではない技術に「正規化」がある。正規化を施すことで次元や尺度の異なる数字を同じ大きさで扱えるようになるため、様々な場所で用いられている。この正規化は数量を代表値で割るなどして無次元量化し、互いに比較できるようにすることに他ならない。今回はこの方法について共有する。

1. MIN-MAX正規化(0～1)

最も使われている、あるデータセットに含まれるある値を0～1で表現する正規化手法について説明する。

1.1. encode

あるデータセット$\mathbb{D}$に含まれる正規化したい値を$x$としたときの式は以下である。

x_{norm} = \dfrac{x - x_{min}}{x_{max} - x_{min} }

Pythonコードは以下である。

def min_max_encode(self, x, source_min, source_max):
  return (x - source_min) / ( source_max - source_min)

1.2. decode

次にデコードは以下のように行う。ここで、$x_{norm}$にはエンコードで変換した値が入る。

x = x_{norm}x_{max} - x_{norm}x_{min} + x_{min}

式展開:

x_{norm} = \dfrac{x - x_{min}}{x_{max} - x_{min} }\\
x_{norm}x_{max} - x_{norm}x_{min} = x - x_{min} \\
x_{norm}x_{max} - x_{norm}x_{min} + x_{min} = x

Pythonコードは以下である。

def min_max_decode(self, x_norm, source_min, source_max):
  return (x_norm * source_max) - (x_norm * source_min) + source_min

2. 活性化関数向けの正規化(-1～+1)

深層学習などの活性化関数でhard-tanh等を使う場合は入力値も-1 ~ +1に変換を行うことがあるだろう。
この際の正規化についても触れておきたい。

2.1. encode

あるデータセット$\mathbb{D}$に含まれる正規化したい値を$x$としたときの式は以下である。

x_{norm} = \dfrac{x - x_{min}}{x_{max} - x_{min} } \cdot 2 - 1

Pythonコードは以下である。

def min_max_encode(self, x, source_min, source_max):
  return (((x - source_min) / ( source_max - source_min) ) * 2.0) - 1.0

2.2. decode

次にデコードは以下のように行う。ここで、$x_{norm}$にはエンコードで変換した値が入る。

x = \dfrac{x_{norm}x_{max} - x_{norm}x_{min} + x_{max} + x_{min}}{2}

式展開:

x_{norm} = \dfrac{x - x_{min}}{x_{max} - x_{min} } \cdot 2 - 1 \\
\dfrac{x_{norm} + 1}{2} = \dfrac{x - x_{min}}{x_{max} - x_{min} }\\
\dfrac{(x_{norm} + 1)(x_{max} - x_{min})}{2}=(x - x_{min})\\
\dfrac{x_{norm}x_{max} - x_{norm}x_{min} + x_{max} - x_{min}}{2}+x_{min}=x\\
\dfrac{x_{norm}x_{max} - x_{norm}x_{min} + x_{max} + x_{min}}{2} = x

Pythonコードは以下である。

def min_max_decode(self, x_norm, source_min, source_max):
  return ((x_norm * source_max) - (x_norm * source_min) + source_max + source_min) * 0.5

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up