More than 5 years have passed since last update.

AWSでDeep Voice 3を訓練しました

Last updated at 2019-03-31Posted at 2019-03-31

AWSで音声合成した時の記録(初心者視点)です。

音声合成に関して

Deep Voice 3
Wei Ping, Kainan Peng, Andrew Gibiansky, et al, “Deep Voice 3: 2000-Speaker Neural Text-to-Speech”, arXiv:1710.07654, Oct. 2017.

Deep Voice 3は音声合成システムです。声の再現性、訓練速度の速さが特徴です。
https://www.atmarkit.co.jp/ait/articles/1712/22/news020.html

ここではr9y9さんの実装を使用しました。
今回は利用していませんが、話者適応可能、日本語にも対応可能らしいです。
https://github.com/r9y9/deepvoice3_pytorch

使ったデータセットはKeith ItoさんのLJSpeech-1.1です。
https://keithito.com/LJ-Speech-Dataset/

環境

AWS(Deep Learning AMI)
OSはUbuntuを選びました。
ボリュームはデータセットに応じて、今回は35GiB増やして110GiBにしました。

インスタンスの接続までの手順は下記を参考にしました。
インスタンスによっては上限緩和申請が必要になります。
https://aws.amazon.com/jp/getting-started/tutorials/get-started-dlami/

インスタンス接続後の操作

パッケージの更新

$ sudo apt-get update 
$ sudo apt-get upgrade 
$ pip install --upgrade pip

CUDAのバージョン変えたい場合はこちらへ

訓練の準備

$ git clone https://github.com/r9y9/deepvoice3_pytorch && cd deepvoice3_pytorch 
$ pip install -e ".[bin]" 
$ python 
> import nltk 
> nltk.download('cmudict')

PyTorchからGPUが使えそうか確認

$ python -c "import torch; print(torch.cuda.is_available()); print(torch.cuda.get_device_name(0));" 
True 
'Tesla K80'

データセットの取得(S3経由)

必要なポリシー(ListBucket, GetObject)を持つユーザーをAWSマネジメントコンソールで作っておきます。

$ aws configure

S3へのアップロードをした前提で

$ aws s3 cp s3://examplebucket/LJSpeech-1.1  ./LJSpeech-1.1  --recursive 
	  
$ python preprocess.py --preset=presets/deepvoice3_ljspeech.json ljspeech ./LJSpeech-1.1 ./data/ljspeech

訓練

sshの接続を切ってもいいようにバックグラウンドで訓練を開始する。

$ mkdir ~/output_log  
$ nohup python train.py --preset=presets/deepvoice3_ljspeech.json --data-root=./data/ljspeech  > ~/output_log/out.log &

期待通りGPUを使っているか確認

$ watch -n 2 nvidia-smi

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up