Scalar Encoders

　入力値をSpatial Poolingの処理をかける前に、入力値をEncoderを使ってビットの並びに変換する必要がある。

ScalarEncoder

　数値をビットにエンコードする。

class nupic.encoders.scalar.ScalarEncoder(w,minval, maxval, periodic=False, n=0, radius=0, resolution=0, name=None, verbosity=0, clipInput=False, forced=False)

w: エンコード後のビットを立てる幅を指定する。
minval: 入力の最小値。
maxval: 入力の最大値
periodic: ”True”にすると、ビットがwrap aroundな立ち方（例[100011]）をするようになり、またmaxval値以上の入力の場合にエラーとする。周期的な値をとる入力に用いる。
n: エンコード後のビットの数を指定する。
radius: ２つの入力値が指定した値以上離れていれば、ビットが重ならないことを保障する。
resolution: エンコード後の表現が識別可能になる粒度。例えば、1であれば入力値が1単位でエンコード後の表現が識別可能になる（0.5単位では識別できない）。
forced: wの値は21以上の数値を設定しないと値のチェックでエラーになる。"True"にすることによってチェックを無視する。

※n, radius, resolutionのパラメータはいずれか１つのみ指定が可能。

使用例

from nupic.encoders.scalar import ScalarEncoder

scalarEncoder = ScalarEncoder(w = 3, 
                              minval = 1,
                              maxval = 10, 
                              periodic = False,
                              #n = 12, 
                              #radius = 3,
                              resolution = 1,
                              forced = True)

#resolutionが1のため、1と２の識別は可能
print '  1', scalarEncoder.encode(1)
print '  2', scalarEncoder.encode(2)
#識別可能な場合はbucket indexも異なる
print 'bucket index(   1):',  scalarEncoder.getBucketIndices(1)
print 'bucket index(   2):',  scalarEncoder.getBucketIndices(2)
#1と4はビットが重ならなくなる
print '  4', scalarEncoder.encode(4)
#resolutionが1のため、5.5と6の識別は不可能
print '5.5', scalarEncoder.encode(5.5)
print '  6', scalarEncoder.encode(6)
#識別が不可能な時はbucket indexは同じになる
print 'bucket index(5.5):',  scalarEncoder.getBucketIndices(5.5)
print 'bucket index(   6):',  scalarEncoder.getBucketIndices(6)
#入力値の範囲外のためエラー
try:
  print '11', scalarEncoder.encode(11)
except Exception as e:
  print e

　この例ではw=3, minval=1, maxval=10, resolution=1としている。
よって、ビット数(n)は12、重複無くビットを配列するには(radius)3以上離れていれば良い。

出力結果

   1 [1 1 1 0 0 0 0 0 0 0 0 0]
   2 [0 1 1 1 0 0 0 0 0 0 0 0]
bucket index(   1): [0]
bucket index(   2): [1]
   4 [0 0 0 1 1 1 0 0 0 0 0 0]
 5.5 [0 0 0 0 0 1 1 1 0 0 0 0]
   6 [0 0 0 0 0 1 1 1 0 0 0 0]
bucket index(　　5.5): [5]
bucket index(   6): [5]
  11 input (11) greater than range (1 - 10)

AdaptiveScalarEncoder

　処理内容は基本的にはScalarEncoderと同じ。ただし、こちらはminvalとmaxvalを動的に割り当てることが可能である。minvalとmaxvalを指定せずにインスタンスを作成すると、入力値100個の第１から第９９のパーセンタイルを基にこれらを設定する。

class nupic.encoders.adaptive_scalar.AdaptiveScalarEncoder(w, minval=None, maxval=None, periodic=False, n=0, radius=0, resolution=0, name=None, verbosity=0, clipInput=True, forced=False)

　パラメータはScalarEncoderと同じ。ただし、インスタンスを生成する際はnを指定する。（ScalarEncoderはn, radius, resolutionのいずれかであった）

使用例

from nupic.encoders.adaptive_scalar import AdaptiveScalarEncoder

#過去の入力値を基にパラメータが変化していることを確認するため、インスタンスを２つ生成する
adaptiveScalarEncoder1 = AdaptiveScalarEncoder(w = 3,
                                               n = 12,
                                               forced = True)

adaptiveScalarEncoder2 = AdaptiveScalarEncoder(w = 3,
                                               n = 12,
                                               forced = True)

#入力値の順序によって結果が異なる
print '  1', adaptiveScalarEncoder1.encode(1)
print '1.5', adaptiveScalarEncoder1.encode(1.5)
print ' 10', adaptiveScalarEncoder1.encode(10)
print ' 11', adaptiveScalarEncoder1.encode(11)
print '100', adaptiveScalarEncoder1.encode(100)
print ' 50', adaptiveScalarEncoder1.encode(50)

print '100', adaptiveScalarEncoder2.encode(100)
print '  1', adaptiveScalarEncoder2.encode(1)
print '1.5', adaptiveScalarEncoder2.encode(1.5)
print ' 10', adaptiveScalarEncoder2.encode(10)
print ' 11', adaptiveScalarEncoder2.encode(11)
print '100', adaptiveScalarEncoder2.encode(100)

　入力値を基に、出力のビットの位置を調整していることがわかる。

出力結果

  1 [1 1 1 0 0 0 0 0 0 0 0 0]
1.5 [0 0 0 0 0 1 1 1 0 0 0 0]
 10 [0 0 0 0 0 0 0 0 0 1 1 1]
 11 [0 0 0 0 0 0 0 0 0 1 1 1]
100 [0 0 0 0 0 0 0 0 0 1 1 1]
 50 [0 0 0 0 1 1 1 0 0 0 0 0]

100 [1 1 1 0 0 0 0 0 0 0 0 0]
  1 [1 1 1 0 0 0 0 0 0 0 0 0]
1.5 [1 1 1 0 0 0 0 0 0 0 0 0]
 10 [0 1 1 1 0 0 0 0 0 0 0 0]
 11 [0 1 1 1 0 0 0 0 0 0 0 0]
100 [0 0 0 0 0 0 0 0 0 1 1 1]

RandomDistributedScalarEncoder

　数値をrandom distributed representationに変換する。ScalarEncoderと違い、入力の最小値と最大値の範囲は動的に変化させることが可能。インスタンスを作成する際のパラメータはresolutionだけでよい。
random　distributed representationの特徴として、bucket indexが近いもの同士のビットには重なりが多い。つまり、２つの入力値の差が小さいほどビットの重なりは増え、逆に差が大きいとビットの重なりは減る。この関係を表したのが以下の擬似コードである。これはbucket index iとjにおいて、それらの絶対値がw(立てるビット数)未満であればw - |i - j|の重なりが生じ、一方絶対値がw以上であれば重なりがないことを示している。

If abs(i-j) < w then:
  overlap(i,j) = w - abs(i-j)
else:
  overlap(i,j) <= maxOverlap

　なお、インスタンスが作成されてから削除されるまでは、ビットの表現に変化はない。

class nupic.encoders.random_distributed_scalar.RandomDistributedScalarEncoder(resolution, w=21, n=400, name=None, offset=None, seed=42, verbosity=0)

offset: 中央のbucket index（５００）の取りうる値の範囲を決定するための基点の値を指定する。たとえば、resolution=7でoffset=1の場合、中央のbucket（bucket indexは500）が取りうる値の範囲は、offset - resolution/2 < x < offset + resolution/2で求められるから、-2.5 < x < 4.5となる。

使用例

from nupic.encoders.random_distributed_scalar import RandomDistributedScalarEncoder
import numpy as np

#入力値のresolution（粒度）を１、offset（入力値の中央の値：bucketの中央になる）を１と指定して、インスタンスを作成する
#w, nは未指定の場合、それぞれ21, 400となる
randomDistributedScalarEncoder1 = RandomDistributedScalarEncoder(resolution = 1,
                                                                 offset = 1)

#入力値１は中央のbucketのため、bucket indexは５００となる。（取りうるbucket indexの範囲は０〜９９９。これはn, wを変えても変わらない）
print 'bucket index(     1):', randomDistributedScalarEncoder1.getBucketIndices(1)

#識別可能な表現の最大の入力値は500(=（９９９　- 500) * 1 + 1）
#(max bucket index - middle bucket index) * resolution + offset
#それより大きい入力値でもエラーにはならないが、識別不可能な表現となる
print 'bucket index( 500):', randomDistributedScalarEncoder1.getBucketIndices(500)
print 'bucket index( 501):', randomDistributedScalarEncoder1.getBucketIndices(501)

#識別可能な表現の最小の入力値は-499(=（0　- 500 * 1 + 1）
#それ未満の入力値でもエラーにはならないが、識別不可能な表現となる
print 'bucket index(-499):', randomDistributedScalarEncoder1.getBucketIndices(-499)
print 'bucket index(-500):', randomDistributedScalarEncoder1.getBucketIndices(-500)

#エンコード後のビットが立っている位置
print '       1 =>', np.nonzero(randomDistributedScalarEncoder1.encode(1))[0]
print '  500 =>', np.nonzero(randomDistributedScalarEncoder1.encode(500))[0]
print ' -500 =>', np.nonzero(randomDistributedScalarEncoder1.encode(-500))[0]

#bucket indexが1と５の重複しているビットの位置を求める（|1 - 5| < w)のケース
overlap_1_5 = np.nonzero(randomDistributedScalarEncoder1.encode(1) * randomDistributedScalarEncoder1.encode(5))
#重複しているビット数（21 - |1 - 5| = 17)
print 'overlap 1 and   5 => ', len(overlap_1_5[0])

#bucket indexが1と23の重複しているビットの位置を求める（|1 - 23| >= w)のケース
overlap_1_23 = np.nonzero(randomDistributedScalarEncoder1.encode(1) * randomDistributedScalarEncoder1.encode(23))
#重複しているビット数
print 'overlap 1 and 23 => ', len(overlap_1_23[0])


#インスタンス作成時にoffsetを指定しない場合、最初の入力値（２０００）がoffsetとして採用される
randomDistributedScalarEncoder2 = RandomDistributedScalarEncoder(resolution = 100)
print 'bucket index(2000):', randomDistributedScalarEncoder2.getBucketIndices(2000)
print 'offset:', randomDistributedScalarEncoder2._offset

出力結果

bucket index(   1): [500]
bucket index( 500): [999]
bucket index( 501): [999]
bucket index(-499): [0]
bucket index(-500): [0]
 　　 1 => [ 41  66  80  94 100 119 120 149 171 218 235 265 282 300 312 314 316 331 333 346 361]
　　500 => [  7  20  38  61  87 108 111 128 163 174 197 213 215 243 267 268 294 304 316 320 380]
-500 => [  4  28  29  32  36  47  49  59  93 112 127 132 135 187 199 205 208 222 284 317 384]
overlap 1 and   5 =>  17
overlap 1 and  23 =>  0
bucket index(2000): [500]
offset: 2000

Category Encoders

SDRCategoryEncoder

　リストに格納された複数の文字列（カテゴリ）をエンコードする。リスト内に定義されていないカテゴリについては、すべて同一の表現にエンコードされる。

class nupic.encoders.sdr_category.SDRCategoryEncoder(n, w, categoryList=None, name='category', verbosity=0, encoderSeed=1, forced=False)

categoryList: エンコードするリストを指定。未指定であれば、遭遇したものを自動的に追加していく。

使用例

from nupic.encoders.sdr_category import SDRCategoryEncoder

list = ['ruby', 'python', 'java']

categoryEncoder1 = SDRCategoryEncoder(n = 10,
                                      w = 3,
                                      categoryList = list,
                                      forced = True)

#リストに定義されてる文字列は識別可能な表現になる
print 'ruby   :', categoryEncoder1.encode('ruby')
print 'python :', categoryEncoder1.encode('python')
print 'java   :', categoryEncoder1.encode('java')

#リストに定義されていない文字列は全て同じ表現になり、識別不可能
print 'perl   :', categoryEncoder1.encode('perl')
print '       :', categoryEncoder1.encode('')

#インスタンス作成時にリストを指定しないと、遭遇したものを自動的に追加する。
categoryEncoder2 = SDRCategoryEncoder(n = 10,
                                      w = 3,
                                      forced = True)

print 'ruby   :', categoryEncoder2.encode('ruby')
print 'python :', categoryEncoder2.encode('python')
print 'java   :', categoryEncoder2.encode('java')
print 'perl   :', categoryEncoder2.encode('perl')
print '       :', categoryEncoder2.encode('')

出力結果

ruby   : [1 1 0 1 0 0 0 0 0 0]
python : [0 1 0 0 0 0 0 1 1 0]
java   : [0 0 0 1 0 0 0 1 0 1]
perl   : [0 1 0 0 1 0 1 0 0 0]
       : [0 1 0 0 1 0 1 0 0 0]

ruby   : [1 1 0 1 0 0 0 0 0 0]
python : [0 1 0 0 0 0 0 1 1 0]
java   : [0 0 0 1 0 0 0 1 0 1]
perl   : [0 0 0 0 1 1 0 0 0 1]
       : [1 0 0 1 0 0 1 0 0 0]

NuPIC Encoders

Scalar Encoders

ScalarEncoder

使用例

AdaptiveScalarEncoder

使用例

RandomDistributedScalarEncoder

使用例

Category Encoders

SDRCategoryEncoder

使用例