More than 5 years have passed since last update.

espnet-音声処理(TTS)の動作メモ（作業中）

メモ

Posted at 2019-12-02

ESPNET-TTS

ESPnetは、エンドツーエンドの音声処理ツールキットです、主にエンドツーエンドの音声認識とエンドツーエンドのテキスト読み上げに特化しています.

colabでのテスト

https://colab.research.google.com/github/espnet/notebook/blob/master/tts_realtime_demo.ipynb#scrollTo=C1a5CgX1AHXJ
ここからデモを実行できます.

English demoとJapanese demoの2つが用意されています

English demo

Download pretrained models
You can select one from three models. Please only run the seletected model cells.

install
[Tacotron2] [Transformer] [FastSpeech]
Setup
Synthesis
を順番にすすめるだけです.
[Tacotron2] [Transformer] [FastSpeech]のみっつのモデルから選ぶことができます.
Synthesisで任意の文字列を入力すればTTSしてくれます.

試しにTacotron2で生成してみました.

This is a computer
https://soundcloud.com/jg1-wwk/e2e-tts-en-lang-test-this-is-a-computer

ここから試聴できます.

Japanese demo

Install Japanese dependencies
(a) Tacotron 2
(b) Transformer (これも選べる）
Setup
Synthesis

これも順次進めていくだけでした.

テスト音声はこちら.

https://soundcloud.com/jg1-wwk/e2e-tts-demo-jp-lang-test
"計算機最上川" KEISANKIMOGAMIGAWA

実機環境,オンプレでの動作テスト

colabはクラウド上でのコンピューティングなので制限も少々有ります.
ここではオンプレマシンでの推論を動かしてみます.

python環境はminiconda+python3.6
cuda versuon 10.2

~/.bashrcあたりに以下のパスを通しました.

export CUDAROOT=/usr/local/cuda                                                                                                                               export PATH=$CUDAROOT/bin:$PATH                                                                                                                               export LD_LIBRARY_PATH=$CUDAROOT/lib64:$LD_LIBRARY_PATH                                                                                                       export CFLAGS="-I$CUDAROOT/include $CFLAGS"                                                                                                                   export CUDA_HOME=$CUDAROOT                                                                                                                                    export CUDA_PATH=$CUDAROOT

git clone、推論用文字列のセット、

sudo apt-get install libsndfile1-dev
sudo apt-get install libprotobuf9v5 protobuf-compiler libprotobuf-dev
conda create -n espnet python=3.7
conda activate espnet
git clone git@github.com:espnet/espnet.git

kaldiのインストール

やり方その1

cd tools
make KALDI=/home/rocm/miniconda3/envs/esp/bin/python

やり方その2 CPU専用

cd tools
 make CUPY_VERSION='' -j 10

installチェック

make check_install

どうもErrorが取れないので環境構築がうまく行かない

ERROR: tensorboardx 1.9 has requirement protobuf>=3.8.0, but you'll have protobuf 3.0.0 which is incompatible.
Installing collected packages: PyYAML, filelock, typing, typing-extensions, chainer, configargparse, editdistance, funcsigs, more-itertools, zipp, importlib-metadata, inflect, nltk, distance, g2p-en, h5py, jaconv, kaldiio, scipy, librosa, matplotlib, pandas, attrs, pyrsistent, jsonschema, stempeg, pyaml, musdb, museval, bottleneck, nara-wpe, pysptk, sklearn, fastdtw, nnmnkwii, pillow, pystoi, pytorch-wpe, sentencepiece, tensorboardX, torch-complex, unidecode, espnet
  Found existing installation: PyYAML 3.12
ERROR: Cannot uninstall 'PyYAML'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
Makefile:70: recipe for target 'espnet.done' failed
make: *** [espnet.done] Error 1

puYAMLの再インストールErrorとtensorboardxのバージョンErrorが両方起きている模様.

推論テスト

(まだ未完走）

cd ..
cd ./egs/ljspeech/tts1/
echo "THIS IS A DEMONSTRATION OF TEXT TO SPEECH." > example.txt
../../../utils/synth_wav.sh --vocoder_models ljspeech.wavenet.mol.v1 example.txt

makeが一応完了したのでtest runを実施しましたが

$ ../../../utils/synth_wav.sh --models ljspeech.fastspeech.v1 example.txt
--2019-12-02 18:42:25--  https://drive.google.com/uc?export=download&id=17RUNFLP4SSTbGA01xWRJo7RkR876xM0i
Resolving drive.google.com (drive.google.com)... 2404:6800:4004:800::200e, 216.58.197.142
Connecting to drive.google.com (drive.google.com)|2404:6800:4004:800::200e|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: 'decode/download/ljspeech.fastspeech.v1/kDGmpO..tar.gz'

decode/download/ljspeech.fastspeech.v1/     [ <=>                                                                          ]   3.21K  --.-KB/s    in 0s

2019-12-02 18:42:26 (32.3 MB/s) - 'decode/download/ljspeech.fastspeech.v1/kDGmpO..tar.gz' saved [3292]


gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3292    0  3292    0     0   9119      0 --:--:-- --:--:-- --:--:--  9093
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   388    0   388    0     0   1158      0 --:--:-- --:--:-- --:--:--  1158
100 92.1M    0 92.1M    0     0  35.5M      0 --:--:--  0:00:02 --:--:-- 55.0M
conf/tuning/train_fastspeech.v1.yaml
conf/decode.yaml
data/train_no_dev/cmvn.ark
exp/train_no_dev_pytorch_train_fastspeech.v1/results/model.last1.avg.best
exp/train_no_dev_pytorch_train_fastspeech.v1/results/model.json
data/lang_1char/train_no_dev_units.txt
Sucessfully downloaded zip file from https://drive.google.com/open?id=17RUNFLP4SSTbGA01xWRJo7RkR876xM0i
stage 0: Data preparation
/home/rocm/espnet/egs/ljspeech/tts1/../../../utils/data2json.sh --trans_type char decode/example/data decode/download/ljspeech.fastspeech.v1/data/lang_1char/train_no_dev_units.txt
Traceback (most recent call last):
  File "/home/rocm/espnet/egs/ljspeech/tts1/../../../utils/merge_scp2json.py", line 15, in <module>

    from espnet.utils.cli_utils import get_commandline_args
ModuleNotFoundError: No module named 'espnet'

現状未完走です.

todo

相当色々手を尽くしたがTTS完走に至らないのでアプローチを変えないとダメかもしれない.
オンプレ環境でのTTSができるようにする.(Dockerfileあたりを試すなどする必要がありそう)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up