More than 5 years have passed since last update.

TensorFlow tips - LasagneのNINLayerを移植

Posted at 2017-06-30

はじめに

（非常にニッチな内容で恐縮ですが．）

Theano + Lasagne で書かれていた，とあるGAN(Generative Adversarial Networks)コードをTensorFlowに移植していたところ，"lasagne.layers.NINLayer" というものに出くわした．Lasagneのドキュメントを見ると，

Network-in-network layer. Like DenseLayer, but broadcasting across all trailing dimensions beyond the 2nd. This results in a convolution operation with filter size 1 on all trailing dimensions. Any number of trailing dimensions is supported, so NINLayer can be used to implement 1D, 2D, 3D, ... convolutions.

「2番めの次元を越える次数に対して，"broadcasting" する密結合レイヤーみたいなレイヤー」とのこと．Network-in-network layerという名前から，最初，身構えたが，とても簡単にTensorFlowで書けたので紹介したい．

（プログラミング環境：
- Python: 3.5.2
- Theano: 0.9.0
- Lasagne: 0.2.dev1
- TensorFlow: 1.2.0 ）

Theano + Lasagne の動き

ドキュメントの説明で動作がはっきりしなかったので，小さいデータでテストを行った．

import numpy as np
import theano
import theano.tensor as T

import lasagne
from lasagne.layers import InputLayer, NINLayer

# variables
x = T.tensor4('x')
y = T.matrix('y')

# shared variable
w_np= np.array([[1., 2., 3], [2., 2., 2.]], dtype=np.float32)
ws = theano.shared(w_np, name='w')

# layers
l_in = InputLayer((1, 2, 5, 5))  # image size = 3 
l1 = NINLayer(l_in, num_units=3, W=ws, b=None,
              nonlinearity=None)
l_in.input_var = x
y = lasagne.layers.get_output(l1)

# theano function
mylayer = theano.function(
    inputs=[x],
    outputs=y,
    allow_input_downcast=True
)

x_np = np.ones([1, 2, 5, 5]) * 0.1
y_np = mylayer(x_np)
y_np = np.asarray(y_np)

print('x_np.shape = ', x_np.shape)
print('x_np = \n', x_np)
print(' ')
print('y_np.shape = ', y_np.shape)
print('y_np = \n', y_np)

以下が，計算結果の出力．

x_np.shape =  (1, 2, 5, 5)
x_np = 
 [[[[ 0.1  0.1  0.1  0.1  0.1]
   [ 0.1  0.1  0.1  0.1  0.1]
   [ 0.1  0.1  0.1  0.1  0.1]
   [ 0.1  0.1  0.1  0.1  0.1]
   [ 0.1  0.1  0.1  0.1  0.1]]

  [[ 0.1  0.1  0.1  0.1  0.1]
   [ 0.1  0.1  0.1  0.1  0.1]
   [ 0.1  0.1  0.1  0.1  0.1]
   [ 0.1  0.1  0.1  0.1  0.1]
   [ 0.1  0.1  0.1  0.1  0.1]]]]
 
y_np.shape =  (1, 3, 5, 5)
y_np = 
 [[[[ 0.30000001  0.30000001  0.30000001  0.30000001  0.30000001]
   [ 0.30000001  0.30000001  0.30000001  0.30000001  0.30000001]
   [ 0.30000001  0.30000001  0.30000001  0.30000001  0.30000001]
   [ 0.30000001  0.30000001  0.30000001  0.30000001  0.30000001]
   [ 0.30000001  0.30000001  0.30000001  0.30000001  0.30000001]]

  [[ 0.40000001  0.40000001  0.40000001  0.40000001  0.40000001]
   [ 0.40000001  0.40000001  0.40000001  0.40000001  0.40000001]
   [ 0.40000001  0.40000001  0.40000001  0.40000001  0.40000001]
   [ 0.40000001  0.40000001  0.40000001  0.40000001  0.40000001]
   [ 0.40000001  0.40000001  0.40000001  0.40000001  0.40000001]]

  [[ 0.5         0.5         0.5         0.5         0.5       ]
   [ 0.5         0.5         0.5         0.5         0.5       ]
   [ 0.5         0.5         0.5         0.5         0.5       ]
   [ 0.5         0.5         0.5         0.5         0.5       ]
   [ 0.5         0.5         0.5         0.5         0.5       ]]]]

上の入出力データをながめれば，レイヤーの機能が把握できるかと思う．2次元に展開される各ピクセルデータを入力し，チャンネル方向(depth-wise) に重みを，上の例では，w= [[1., 2., 3], [2., 2., 2.]] を乗ずるという計算である．入力を2チャンネル，重みを [2 x 3] の形としたので，出力は 3チャンネルとなる．また上コードでは，入力データは all... 0.1 だが，実際には，画像データなので，任意の値が入ってくる．

TensorFlow に移植したコード

"Theano" では，画像データは，"channel-1st" （実際には，バッチ処理の都合上，2番めにチャンネルが来る）で扱い，"TensorFlow” では "channel-last" である点に注意して，移植してみた．

# NIN (network in network) like function
def lasagne_nin_like(x, w):
    '''
      args.:
        input shape:    [None, pixel, pixel, input_channel]
        output shape:   [None, pixel, pixel, output_channel]
    '''
    input_ch = tf.shape(x)[-1]      # eq. 2
    output_ch = tf.shape(w)[-1]     # eq. 3
    net = tf.reshape(x, [-1, input_ch])
    net = tf.matmul(net, w)
    y = tf.reshape(net, [-1, 5, 5, output_ch])

    return y

y = lasagne_nin_like(x, W)

入力値をチャンネル数だけ残してフラットな形に tf.reshape() し，マトリクス乗算後に，元の形に戻すだけである．予想以上に簡単に移植できた．

TensorFlow コード全体は，以下の通り．

import numpy as np
import tensorflow as tf


# tensorflow placeholders
x = tf.placeholder(tf.float32, [None, 5, 5, 2])


# shared variable
w_np= np.array([[1., 2., 3], [2., 2., 2.]])
W = tf.Variable(w_np, dtype=tf.float32)

# NIN (network in network) like function
def lasagne_nin_like(x, w):
    '''
      args.:
        input shape:    [None, pixel, pixel, input_channel]
        output shape:   [None, pixel, pixel, output_channel]
    '''
    input_ch = tf.shape(x)[-1]      # eq. 2
    output_ch = tf.shape(w)[-1]     # eq. 3
    net = tf.reshape(x, [-1, input_ch])
    net = tf.matmul(net, w)
    y = tf.reshape(net, [-1, 5, 5, output_ch])

    return y

y = lasagne_nin_like(x, W)

# tensorflow session
init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
sess.run(init)


x_np = np.ones([1, 5, 5, 2], dtype=np.float32) * 0.1
y_np = sess.run(y, feed_dict={x: x_np})

print('x_np.shape = ', x_np.shape)

# output should be transposed to compare theano's result.
# 計算結果比較のため，4Dテンソルの転置 (ch-last -> ch-1st) を行う
y_np = np.transpose(y_np, (0, 3, 1, 2))

print('y_np.shape = ', y_np.shape)
print('y_np = ', y_np)

感想

「案ずるより産むが易し」そのままの状況である．考察として，簡単に移植できた理由は，TensorFlowが "channel-last"の規則に従っているからと考えられる．逆に Theano + Lasagne では，1行でデータを（チャネル基準で）フラット化できない不便さがあったため，lasagne.layers.NINLayer を準備したのかもしれない．

(表題「NINLayerを移植」は，かなり大げさなタイトルでした...)

参考 Web Site

Lasagne documentation
http://lasagne.readthedocs.io/en/latest/index.html
TensorFlow documentation
https://www.tensorflow.org/

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up