More than 3 years have passed since last update.

edgetpu(coral)上で動く温度予測モデル

Last updated at 2021-03-23Posted at 2020-11-21

はじめに

Raspberry pi4につなげたedgetpu(coral)上で動く、tensorflowliteの温度予測モデルを作成しました。Tensorflowのバージョンは2.3.1、edgetpu compilerのバージョンは15.0.340273435、edgetpu runtimeのバージョンは2.14.1です。

主な流れ

1.GoogleColab上でTensorflowのKerasを用いて1次元CNNベースのモデルを作成・学習

2.学習後にモデルを量子化してtensorflowliteのモデルに変換

3.edgetpuで動くようにedgetpu compilerでコンパイル

4.Raspberry pi4上でテストデータを用いて予測 & Raspberry pi4のCPU温度を予測

1.モデルの作成・学習

データセットの準備

過去の気象データを検索できる気象庁のサイトを利用して、ある場所1箇所の1日の平均気温のデータを20年分ほど集めました。

モデルの作成

30日間の平均気温を入力とし、次の日の平均気温を予測する、１次元CNNベースのモデルをKerasのfunctional apiを用いて作成しました。

inputs = tf.keras.Input(shape=(30,1))
cnn1 = tf.keras.layers.Conv1D(filters=1,kernel_size=10,strides=1,activation='relu',input_shape=(30,1))
cnn2 = tf.keras.layers.Conv1D(filters=1,kernel_size=5,strides=1,activation='relu')
cnn3 = tf.keras.layers.Conv1D(filters=1,kernel_size=3,strides=1,activation='relu')
dense1 = tf.keras.layers.Dense(units=8,activation='relu')
dense2 = tf.keras.layers.Dense(units=1)
x = cnn1(inputs)
x = cnn2(x)
x = cnn3(x)
x = tf.keras.layers.Flatten()(x)
x = dense1(x)
outputs = dense2(x)
model = tf.keras.Model(inputs=inputs,outputs=outputs)
model.compile(optimizer="Adam",loss="mean_squared_error",
                               metrics="binary_accuracy")

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_14 (InputLayer)        [(None, 30, 1)]           0         
_________________________________________________________________
conv1d_31 (Conv1D)           (None, 21, 1)             11        
_________________________________________________________________
conv1d_32 (Conv1D)           (None, 17, 1)             6         
_________________________________________________________________
conv1d_33 (Conv1D)           (None, 15, 1)             4         
_________________________________________________________________
flatten_5 (Flatten)          (None, 15)                0         
_________________________________________________________________
dense_24 (Dense)             (None, 8)                 128       
_________________________________________________________________
dense_25 (Dense)             (None, 1)                 9         
=================================================================
Total params: 158
Trainable params: 158
Non-trainable params: 0
_________________________________________________________________
"""

モデルの学習

バッチサイズ50,エポック数30で学習させました。

model.fit(x=in_temp,y=out_temp,batch_size=50,epochs=30)

2.モデルの量子化

edgetpu上で動かすには8bit整数に量子化する必要があります。量子化には入力サイズを固定する必要があるので、siganturesを指定して一旦モデルを保存します。ここでは入力サイズを1×30×1にしました。

opt = tf.function(lambda x:model(x))
BACTH_SIZE = 1
STEPS = 30
INPUT_SIZE = 1
concrete_func = opt.get_concrete_function(tf.TensorSpec([BACTH_SIZE,STEPS,INPUT_SIZE],
                                                        model.inputs[0].dtype,name="inputs")
)
model.save('/content/weather',save_format="tf",signatures=concrete_func)

その後、モデルを量子化させます。詳細はTensorflowのドキュメントを参照してください。
in_tempはモデルの学習時に入力したnumpy ndarrayです。

conv_data = in_temp[0]
conv_data = conv_data.reshape(1,30,1)

def representative_dataset_gen():
  for i in range(len(conv_data)):
    yield [conv_data[i]]

converter_edgetpu = tf.lite.TFLiteConverter.from_saved_model("/content/weather")
converter_edgetpu.optimizations = [tf.lite.Optimize.DEFAULT]
converter_edgetpu.representative_dataset = representative_dataset_gen
converter_edgetpu.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter_edgetpu.inference_input_type = tf.uint8
converter_edgetpu.inference_output_type = tf.uint8
converter_edgetpu.experimental_new_converter = True
tflite = converter_edgetpu.convert()

最後にモデルを保存します。

open("cnn_weather_lite_quantized.tflite","wb").write(tflite)

edgetpu compilerでコンパイル

edgetpu compilerをインストール

Google Colab上にedgetpu compilerをインストールします。
インストール方法の詳細はドキュメントを参照してください。

ドキュメントにあるようにCPUアーキテクチャにx86-64が必要であり、Raspberry pi4BのCPUアーキテクチャはARMv8であるため、直接Raspberry pi4上にインストールすることはできないと思われます。(Raspberry piのCPUアーキテクチャ：https://nw-electric.way-nifty.com/blog/2020/02/post-6c09ad.html)

!curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
!echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list
!sudo apt -y update
!sudo apt-get install edgetpu-compiler

edgetpu compilerでコンパイル

参考：https://coral.ai/docs/edgetpu/compiler/#usage

!edgetpu_compiler /content/cnn_weather_lite_quantized.tflite

予測

テストデータで予測

気象庁のサイトから再び1日の平均気温のデータを30日分用意しました。
そして以下のコード(edgetpu使用時)をRaspberry pi上で実行して次の日の気温を予測しました。また、実行時間を計測して、CPU使用時と時間を比較しました。
推論のコードはTensorflowのサイトの「Pythonでモデルをロードして実行する」という部分を参考にしました。自動で日本語翻訳されるので、翻訳がおかしければ英語で読むことを勧めます。

import numpy as np
import pandas as pd
import tflite_runtime.interpreter as tflite
import time

def main(args):
    interpreter = tflite.Interpreter('/home/pi/cnn_weather/cnn_weather_lite_quantized_edgetpu.tflite',
    experimental_delegates=[tflite.load_delegate('libedgetpu.so.1')])
    
    data = pd.read_csv('/home/pi/cnn_weather/test.csv')
    test_data = np.asarray(data.loc[:len(data),"平均気温"],dtype=np.uint8)
    test_data = test_data.reshape(1,30,1)
    
    start = time.perf_counter()
    
    interpreter.allocate_tensors()
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()
    input_shape = input_details[0]['shape']
    interpreter.set_tensor(input_details[0]['index'],test_data)
    
    interpreter.invoke()
    
    output_data = interpreter.get_tensor(output_details[0]['index'])
    
    end = time.perf_counter()
    
    print("The next day's temperature is " + str(output_data[0,0]) + " degrees Celsius.")
    print("It took " + str((end-start)*1000) + " ms.")
    
    return 0

if __name__ == '__main__':
    import sys
    sys.exit(main(sys.argv))

結果を示します。
一番上がedgetpuで実行した結果、真ん中が量子化してないモデルをCPUで実行した結果、一番下が量子化したモデルをCPUで実行した結果です。
量子化すると速くなりますが、edgetpuを使うとむしろ遅くなるという結果になってしまいました、、、
実行時間が2msとかなので、edgetpuを使ったほうがedgetpuの呼び出し等で時間がかかっているのかもしれません。
ちなみに、量子化するとかなり精度が落ちていることがわかります。(テストデータは9月下旬から10月下旬までのデータで、だいたい22℃くらいから始まって18℃くらいで終わっているので、24℃や26℃はだいぶ的外れな気がします。)

pi@raspberrypi:~$ python3 /home/pi/cnn_weather/edgetpu_time.py
The next day's temperature is 24 degrees Celsius.
It took 2.218683000137389 ms.

pi@raspberrypi:~$ python3 /home/pi/cnn_weather/cpu_time.py
The next day's temperature is 17.671713 degrees Celsius.
It took 3.6856149999948684 ms.

pi@raspberrypi:~$ python3 /home/pi/cnn_weather/cpu_quantized_time.py
The next day's temperature is 26 degrees Celsius.
It took 1.4244879994294024 ms.

CPU温度の予測

Raspberry piという小型デバイスで動かしているので、せっかくならリアルタイム予測のようなものに挑戦してみたくなりました。外気温で予測したかったのですが、センサーがなく自作するのも大変そうなので、CPU温度でとりあえず予測してみました。
1秒ごとに$cat /sys/class/thermal/thermal_zone0/tempでCPU温度を取り、30秒分集まったところで1秒後の温度を予測しました。
なお、$cat /sys/class/thermal/thermal_zone0/tempでは1000倍されたCPU温度の値が返ってくるので、出力結果を1000で割って使いました。

以下がRaspberry piで実行したコードです。(edgetpu使用時のコードのみ示します)

import numpy as np
import tflite_runtime.interpreter as tflite
import time
import subprocess

def main(args):
    start = time.perf_counter()
    interpreter = tflite.Interpreter('/home/pi/cnn_weather/cnn_weather_lite_quantized_edgetpu.tflite',
    experimental_delegates=[tflite.load_delegate('libedgetpu.so.1')])
    
    data = list()
    for i in range(30):
        res = subprocess.run(['cat', '/sys/class/thermal/thermal_zone0/temp'],
        stdout=subprocess.PIPE)
        get_start = time.perf_counter()
        result = res.stdout.decode('utf-8')
        result = int(result)/1000
        data.append(result)
        print(result,end='℃ ')
        if (i+1)%10 == 0:
            print()
        get_end = time.perf_counter()
        get_time = get_end-get_start
        
        if get_time < 1:
            time.sleep(1-get_time)
        else:
            print("Took " + str(get_time) + " s to get " +  str(i) + "'s temp.")
    
    pre_start = time.perf_counter()
    np_data = np.asarray(data,dtype=np.uint8).reshape(1,30,1)
    
    interpreter.allocate_tensors()
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()
    input_shape = input_details[0]['shape']
    interpreter.set_tensor(input_details[0]['index'],np_data)
    interpreter.invoke()
    
    pred = interpreter.get_tensor(output_details[0]['index'])
    
    pre_end = time.perf_counter()
    pre_time = pre_end - pre_start
    if pre_time < 1:
        print("The cpu's temp will be " + str(pred[0,0]) + "℃ in " + 
        str(1-pre_time) + " s.")
        
        time.sleep(1-pre_time)
        res = subprocess.run(['cat', '/sys/class/thermal/thermal_zone0/temp'],
        stdout=subprocess.PIPE)
        result = res.stdout.decode('utf-8')
        result = int(result)/1000
        print("The cpu's temp is " + str(result) + "℃.")
    else:
        print("The cpu's temp must have been " + str(pred[0,0]) + "℃ " + 
        str(1-pre_time) + " s ago.")
    
    end = time.perf_counter()
    print("Took " + str(end-start) + " s to run this code.")
    
    return 0

if __name__ == '__main__':
    import sys
    sys.exit(main(sys.argv))

結果を示します。
一番上がedgetpu使用時、真ん中が量子化されていないモデル使用時、一番下が量子化したモデル使用時です。
羅列されている温度は予測につかったCPU温度のデータです。やはり量子化したモデルでは100℃を超える的外れな出力をしてしまっています。
また、先程のようにedgetpu使用時のが時間がかかる結果となってしまっています。今回は推論だけでなくtf.lite.Interpreterを呼び出すところも含めて時間を測っていたので、その部分もedgetpuを使用することで時間がかかっているのかもしれません。

pi@raspberrypi:~$ python3 /home/pi/cnn_weather/predict_edgetpu.py
63.783℃ 64.757℃ 63.783℃ 63.783℃ 63.296℃ 62.809℃ 63.296℃ 63.296℃ 62.809℃ 62.809℃ 
62.809℃ 63.296℃ 62.322℃ 62.809℃ 63.783℃ 62.809℃ 63.783℃ 63.783℃ 62.322℃ 62.809℃ 
62.322℃ 63.783℃ 62.809℃ 62.322℃ 62.322℃ 62.322℃ 62.322℃ 62.322℃ 62.322℃ 63.296℃ 
The cpu's temp will be 105℃ in 0.9969898569997895 s.
The cpu's temp is 61.835℃.
Took 34.21252226499928 s to run this code.

pi@raspberrypi:~$ python3 /home/pi/cnn_weather/predict_cpu.py
63.783℃ 63.783℃ 63.296℃ 62.809℃ 63.783℃ 63.296℃ 62.809℃ 63.296℃ 62.809℃ 62.322℃ 
62.322℃ 62.322℃ 62.809℃ 62.322℃ 61.835℃ 62.322℃ 62.322℃ 61.348℃ 62.322℃ 62.322℃ 
63.296℃ 61.835℃ 62.322℃ 61.835℃ 61.348℃ 61.348℃ 61.835℃ 62.322℃ 62.809℃ 62.322℃ 
The cpu's temp will be 62.17556℃ in 0.9969654129999981 s.
The cpu's temp is 62.322℃.
Took 31.404364756001087 s to run this code.

pi@raspberrypi:~$ python3 /home/pi/cnn_weather/predict_cpu_quantized.py
63.296℃ 63.296℃ 62.809℃ 62.322℃ 62.322℃ 61.835℃ 61.835℃ 62.322℃ 61.835℃ 62.322℃ 
62.809℃ 62.322℃ 62.809℃ 62.322℃ 60.861℃ 62.322℃ 61.835℃ 61.835℃ 62.322℃ 61.835℃ 
61.835℃ 61.835℃ 61.348℃ 62.322℃ 60.861℃ 61.348℃ 62.322℃ 61.348℃ 61.835℃ 61.348℃ 
The cpu's temp will be 101℃ in 0.9984136980001495 s.
The cpu's temp is 61.835℃.
Took 31.43542323499969 s to run this code.

最後に

Raspberry pi4につなげたedgetpu(coral)上で動く、tensorflowliteの温度予測モデルを作成して実行しました。
量子化することで精度がかなり落ち(というよりほとんど意味のないモデルになってしまいました)、実行時間もCPU使用時よりedgetpu使用時のが多いという結果になってしまいました。
今回は精度のいいモデルを作るというよりもedgetpuを使って推論するモデルを作成すること自体がひとまずの目標であったので、とりあえずここで終わりにしようと思います。
今回使用したデータやコード、tensorflowliteのモデルのすべては、githubに置いてあります。

ここまで見てくださってありがとうございました。
はじめての記事ですので、気になること、指摘等ございましたら気軽にコメントお願いします。

余談

1次元CNNを使用する前にLSTMを使って同じことをしようとしていたのですが、ここにあるようなエラーが出てつまったので、1次元CNNを使うことにしました。そのリンク先のissueを見ると、LSTMなどRNN系のモデルを量子化させることは現在できなさそうです。(リンク先のTensorflowのバージョンは2.2.0、今回ここで使ったのは2.3.1)

同じエラー文ではないですが、関連していると思われるエラーでもLSTMなどRNN系は量子化に対応していないのではないかと言われていました。
なお、量子化していない状態でtensorflowliteのモデルにconvertさせてRaspberry piで動かすことはできました。

(この記事は研究室インターンで取り組みました：https://kojima-r.github.io/kojima/)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up