More than 5 years have passed since last update.

Raspberry pi,Jetson Nanoでリアルタイム音声信号への機械学習モデルによるエフェクト(学習環境作成編)

Posted at 2020-03-16

きっかけ

機械学習でギターアンプをモデリングするに啓発され、Raspberry pi、Jetson Nanoで音声入出力機能を追加し、リアルタイム推論させてみたい。

機械学習でギターアンプをモデリングするでは、WindowsPC上で、Keras、Tensorflow、CUDAを利用した学習、推論を行っている。
これをRaspberry piやJetson Nanoに持ち込むにはどうしたらいいのか検討した。

理想の状態

下記の図の通り、入力した音声信号を推論して、エフェクトのかかった音声信号を出力する。

組み込み型の推論エンジンの検討

■Tensorflow C　Tensorflowの組み込み向けアプリ、C++実装などもできる
■TensorRT　TensorflowやKerasで学習したモデルをh5→Frozen Graph→uffに変換して組み込み機器で高速に推論させる仕掛け。Jetson Nanoと親和性が高い
■nnabla　ソニーの機械学習ライブラリ。Python,C++,Cで同じ学習データをシームレスに学習データがやり取りできる

Tensorflow CはLSTMに対応しているがわからなかったのと、TensorRTはファイル変換が理解できず、難解だった。
今回はパワーのあるデスクトップPCで学習データをそのままRaspberry PI、Jetson Nanoへ持ち込めることを優先とし、nnablaを選択した。

環境構築

まず、学習する環境を整える。
Windows10　PC　8GB　RAM
NVIDIA Geforce RTX2070
Python3.7.1インストール

基本的にはhttps://nnabla.org/ja/download/の通り進める。

コマンドラインで
pip install nnablaと打つだけ

>nnabla_cli
2020-03-14 17:40:45,986 [nnabla][INFO]: Initializing CPU extension...
NNabla command line interface (Version:1.5.0, Build:200121063740)
usage: nnabla_cli [-h] [-m]
                  {train,infer,forward,encode_param,decode_param,profile,conv_dataset,compare_with_cpu,create_image_classification_dataset,create_object_detection_dataset,upload,create_tar,dump,nnb_template,convert,function_info,optimize,plot_series,plot_timer,draw_graph,version}
                  ...

Command line interface for NNabla(Version:1.5.0, Build:200121063740)

positional arguments:
  {train,infer,forward,encode_param,decode_param,profile,conv_dataset,compare_with_cpu,create_image_classification_dataset,create_object_detection_dataset,upload,create_tar,dump,nnb_template,convert,function_info,optimize,plot_series,plot_timer,draw_graph,version}
    train               Training with NNP.
    infer               Do inference with NNP and binary data file input.
    forward             Do evaluation with NNP and test dataset.
    encode_param        Encode plain text to parameter format.
    decode_param        Decode parameter to plain text.
    profile             Profiling performance with NNP.
    conv_dataset        Convert CSV dataset to cache.
    compare_with_cpu    Compare performance between two nntxt.
    create_image_classification_dataset
                        Create dataset from image files.
    create_object_detection_dataset
                        Create dataset from image and label files.
    upload              Upload dataset to Neural Network Console.
    create_tar          Create tar file for Neural Network Console.
    dump                Dump network with supported format.
    nnb_template        Generate NNB config file template.
    convert             File format converter.
    function_info       Output function info.
    optimize            Optimize pb model.
    plot_series         Plot *.series.txt files.
    plot_timer          Plot *.timer.txt files.
    draw_graph          Draw a graph in a NNP or nntxt file with graphviz.
    version             Print version and build number.

optional arguments:
  -h, --help            show this help message and exit
  -m, --mpi             exec with mpi.

上記の通りインストールできた。

GPUを有効にする

CUDA 10.0をインストールしているので、

pip install -U nnabla_ext_cuda100

と打つだけ。

環境が用意できた。

学習ファイルの用意

機械学習でギターアンプをモデリングするでは、以下のようにデータを用意する。
(train_x1.wav , train_y1.wav)
(train_x2.wav , train_y2.wav)
(train_x3.wav , train_y3.wav)
(train_x4.wav , train_y4.wav)

train_xが説明変数となるエフェクトをかける前のドライ音、
train_yが目的変数となるエフェクトをかけた後の音となる。
これを今回、無料のDAWのcakewalk by BandLabを使って学習データペアを作成する。

今回かけるエフェクトはVoxengo Tube Amp。

変化がわかりやすいよう、上記のはっきりしたセッティングで行う。

Cakewalk上でオーディオのエクスポートで、ドライ音とウエット音をスプリットモノで書きだす。

その際サンプルレートは48000、ビット数は32で保存する。

上記で保存すると、32ビット浮動小数点フォーマットで作成されるので、Audacityを使って、32ビット整数型で変換して保存する。

テストデータは下記で、上記と同様に作成する。
(val_x.wav , val_y.wav)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up