はじめに
Raspberry PIと違うのはGPUを搭載している点であるので、やはり機械学習を試してみたい。しかし与えられているサンプルは高度すぎるので、まずは単純なAND回路を作ってみる。
TensorFlowのインストール
Jetson Nano用にオフィシャルなTensorFlow1があるのでインストールする。
sudo apt-get install python3-pip libhdf5-serial-dev hdf5-tools
pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v42 tensorflow-gpu==1.13.1+nv19.4 --user
次のコマンドで確認できる。
python3 -c 'import tensorflow; print(tensorflow.__version__)'
現時点では 1.13.1 がインストールされた。
他に関連するパッケージではTensorBoard, Estimatorがインストールされる。TensorRTは使ったことがないので別途調査する予定。
$ pip3 freeze | grep tensor
tensorboard==1.13.1
tensorflow-estimator==1.13.0
tensorflow-gpu==1.13.1+nv19.4
tensorrt==5.0.6.3
ソースコードの作成
オフィシャルなTensorFlowにはKerasも付いているとのことなのでKerasを用いた。適当なエディタを使って以下のand.py
を作成。
and.py
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# Model
model = Sequential()
model.add(Dense(1, input_shape=(2, ), activation='sigmoid'))
model.compile(loss='mse', optimizer='adam', metrics=['acc'])
model.summary()
# Training
x_train = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) # N x 2
y_train = np.array([0, 0, 0, 1]).reshape(-1, 1) # N x 1
model.fit(x_train, y_train, epochs=3000, verbose=True)
# Evaluation
x_test = x_train
y_test = y_train
score = model.evaluate(x_test, y_test, verbose=False)
print('Test score:', score[0])
print('Test accuracy:', score[1])
実行
作ったプログラムはpython3を使って起動する。
$ python3 and.py
WARNING:tensorflow:From /home/yamamo-to/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/yamamo-to/.local/lib/python3.6/site-packages/tensorflow/python/keras/utils/losses_utils.py:170: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 1) 3
=================================================================
Total params: 3
Trainable params: 3
Non-trainable params: 0
_________________________________________________________________
WARNING:tensorflow:From /home/yamamo-to/.local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-05-20 01:36:57.560684: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2019-05-20 01:36:57.561667: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x15dbc480 executing computations on platform Host. Devices:
2019-05-20 01:36:57.561737: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): <undefined>, <undefined>
2019-05-20 01:36:57.627701: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:965] ARM64 does not support NUMA - returning NUMA node zero
2019-05-20 01:36:57.627993: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x15e9b700 executing computations on platform CUDA. Devices:
2019-05-20 01:36:57.628052: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
2019-05-20 01:36:57.628401: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216
pciBusID: 0000:00:00.0
totalMemory: 3.87GiB freeMemory: 598.47MiB
2019-05-20 01:36:57.628476: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-05-20 01:36:58.625436: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-20 01:36:58.625521: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-05-20 01:36:58.625559: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-05-20 01:36:58.625749: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 130 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
Epoch 1/3000
2019-05-20 01:36:59.260357: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10.0 locally
4/4 [==============================] - 1s 204ms/sample - loss: 0.2856 - acc: 0.5000
Epoch 2/3000
4/4 [==============================] - 0s 2ms/sample - loss: 0.2853 - acc: 0.5000
Epoch 3/3000
...
Epoch 2999/3000
4/4 [==============================] - 0s 2ms/sample - loss: 0.0840 - acc: 1.0000
Epoch 3000/3000
4/4 [==============================] - 0s 2ms/sample - loss: 0.0840 - acc: 1.0000
Test score: 0.08399419486522675
Test accuracy: 1.0
その他
実行中、電力は実行前 3.8W だったのが 5.8〜6.0W と 2W 程度の増加だった。実行時間はtimeコマンドで計測すると以下の通り。
$ time python3 add.py
...
real 0m47.031s
user 1m2.516s
sys 0m9.184s