ESPNET-TTS
ESPnetは、エンドツーエンドの音声処理ツールキットです、主にエンドツーエンドの音声認識とエンドツーエンドのテキスト読み上げに特化しています.
colabでのテスト
https://colab.research.google.com/github/espnet/notebook/blob/master/tts_realtime_demo.ipynb#scrollTo=C1a5CgX1AHXJ
ここからデモを実行できます.
English demoとJapanese demoの2つが用意されています
English demo
Download pretrained models
You can select one from three models. Please only run the seletected model cells.
install
[Tacotron2] [Transformer] [FastSpeech]
Setup
Synthesis
を順番にすすめるだけです.
[Tacotron2] [Transformer] [FastSpeech]のみっつのモデルから選ぶことができます.
Synthesisで任意の文字列を入力すればTTSしてくれます.
試しにTacotron2で生成してみました.
This is a computer
https://soundcloud.com/jg1-wwk/e2e-tts-en-lang-test-this-is-a-computer
ここから試聴できます.
Japanese demo
Install Japanese dependencies
(a) Tacotron 2
(b) Transformer (これも選べる)
Setup
Synthesis
これも順次進めていくだけでした.
テスト音声はこちら.
https://soundcloud.com/jg1-wwk/e2e-tts-demo-jp-lang-test
"計算機最上川" KEISANKIMOGAMIGAWA
実機環境,オンプレでの動作テスト
colabはクラウド上でのコンピューティングなので制限も少々有ります.
ここではオンプレマシンでの推論を動かしてみます.
python環境はminiconda+python3.6
cuda versuon 10.2
~/.bashrcあたりに以下のパスを通しました.
export CUDAROOT=/usr/local/cuda export PATH=$CUDAROOT/bin:$PATH export LD_LIBRARY_PATH=$CUDAROOT/lib64:$LD_LIBRARY_PATH export CFLAGS="-I$CUDAROOT/include $CFLAGS" export CUDA_HOME=$CUDAROOT export CUDA_PATH=$CUDAROOT
git clone、推論用文字列のセット、
sudo apt-get install libsndfile1-dev
sudo apt-get install libprotobuf9v5 protobuf-compiler libprotobuf-dev
conda create -n espnet python=3.7
conda activate espnet
git clone git@github.com:espnet/espnet.git
kaldiのインストール
やり方その1
cd tools
make KALDI=/home/rocm/miniconda3/envs/esp/bin/python
やり方その2 CPU専用
cd tools
make CUPY_VERSION='' -j 10
installチェック
make check_install
どうもErrorが取れないので環境構築がうまく行かない
ERROR: tensorboardx 1.9 has requirement protobuf>=3.8.0, but you'll have protobuf 3.0.0 which is incompatible.
Installing collected packages: PyYAML, filelock, typing, typing-extensions, chainer, configargparse, editdistance, funcsigs, more-itertools, zipp, importlib-metadata, inflect, nltk, distance, g2p-en, h5py, jaconv, kaldiio, scipy, librosa, matplotlib, pandas, attrs, pyrsistent, jsonschema, stempeg, pyaml, musdb, museval, bottleneck, nara-wpe, pysptk, sklearn, fastdtw, nnmnkwii, pillow, pystoi, pytorch-wpe, sentencepiece, tensorboardX, torch-complex, unidecode, espnet
Found existing installation: PyYAML 3.12
ERROR: Cannot uninstall 'PyYAML'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
Makefile:70: recipe for target 'espnet.done' failed
make: *** [espnet.done] Error 1
puYAMLの再インストールErrorとtensorboardxのバージョンErrorが両方起きている模様.
推論テスト
(まだ未完走)
cd ..
cd ./egs/ljspeech/tts1/
echo "THIS IS A DEMONSTRATION OF TEXT TO SPEECH." > example.txt
../../../utils/synth_wav.sh --vocoder_models ljspeech.wavenet.mol.v1 example.txt
makeが一応完了したのでtest runを実施しましたが
$ ../../../utils/synth_wav.sh --models ljspeech.fastspeech.v1 example.txt
--2019-12-02 18:42:25-- https://drive.google.com/uc?export=download&id=17RUNFLP4SSTbGA01xWRJo7RkR876xM0i
Resolving drive.google.com (drive.google.com)... 2404:6800:4004:800::200e, 216.58.197.142
Connecting to drive.google.com (drive.google.com)|2404:6800:4004:800::200e|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: 'decode/download/ljspeech.fastspeech.v1/kDGmpO..tar.gz'
decode/download/ljspeech.fastspeech.v1/ [ <=> ] 3.21K --.-KB/s in 0s
2019-12-02 18:42:26 (32.3 MB/s) - 'decode/download/ljspeech.fastspeech.v1/kDGmpO..tar.gz' saved [3292]
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 3292 0 3292 0 0 9119 0 --:--:-- --:--:-- --:--:-- 9093
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 388 0 388 0 0 1158 0 --:--:-- --:--:-- --:--:-- 1158
100 92.1M 0 92.1M 0 0 35.5M 0 --:--:-- 0:00:02 --:--:-- 55.0M
conf/tuning/train_fastspeech.v1.yaml
conf/decode.yaml
data/train_no_dev/cmvn.ark
exp/train_no_dev_pytorch_train_fastspeech.v1/results/model.last1.avg.best
exp/train_no_dev_pytorch_train_fastspeech.v1/results/model.json
data/lang_1char/train_no_dev_units.txt
Sucessfully downloaded zip file from https://drive.google.com/open?id=17RUNFLP4SSTbGA01xWRJo7RkR876xM0i
stage 0: Data preparation
/home/rocm/espnet/egs/ljspeech/tts1/../../../utils/data2json.sh --trans_type char decode/example/data decode/download/ljspeech.fastspeech.v1/data/lang_1char/train_no_dev_units.txt
Traceback (most recent call last):
File "/home/rocm/espnet/egs/ljspeech/tts1/../../../utils/merge_scp2json.py", line 15, in <module>
from espnet.utils.cli_utils import get_commandline_args
ModuleNotFoundError: No module named 'espnet'
現状未完走です.
todo
相当色々手を尽くしたがTTS完走に至らないのでアプローチを変えないとダメかもしれない.
オンプレ環境でのTTSができるようにする.(Dockerfileあたりを試すなどする必要がありそう)