実行環境
- Ubuntu18.04(Docker 19.03.12)
- Python3.7
- CUDA 10.0
ESPNetのStage4(Network Training)で以下のエラーが発生
dictionary: data/lang_1char/train_nodup_sp_units.txt
stage 4: Network Training
run.pl: job failed, log is in exp/train_nodup_sp_pytorch_train_rnn/train.log
exp/train_nodup_sp_pytorch_train_rnn/train.log
# asr_train.py --config conf/tuning/train_rnn.yaml --ngpu 1 --backend pytorch --outdir exp/train_nodup_sp_pytorch_train_rnn/results --tensorboard-dir tensorboard/train_nodup_sp_pytorch_train_rnn --debugmode 1 --dict data/lang_1char/train_nodup_sp_units.txt --debugdir exp/train_nodup_sp_pytorch_train_rnn --minibatches 0 --seed 1 --verbose 0 --resume --train-json dump/train_nodup_sp/deltafalse/data.json --valid-json dump/train_dev/deltafalse/data.json
# Started at Mon Jul 20 06:07:11 UTC 2020
#
/espnet/tools/venv/lib/python3.7/site-packages/librosa/util/decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that has moved location.
Import requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
from numba.decorators import jit as optional_jit
/espnet/tools/venv/lib/python3.7/site-packages/librosa/util/decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that has moved location.
Import of 'jit' requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
from numba.decorators import jit as optional_jit
2020-07-20 06:07:12,362 (asr_train:555) WARNING: Skip DEBUG/INFO messages
2020-07-20 06:07:12,614 (nets_utils:417) WARNING: Subsampling is not performed for vgg*. It is performed in max pooling layers at CNN.
Exception in main training loop: module 'warpctc_pytorch' has no attribute 'gpu_ctc'
Traceback (most recent call last):
File "/espnet/tools/venv/lib/python3.7/site-packages/chainer/training/trainer.py", line 316, in run
update()
File "/espnet/espnet/asr/pytorch_backend/asr.py", line 242, in update
self.update_core()
File "/espnet/espnet/asr/pytorch_backend/asr.py", line 206, in update_core
data_parallel(self.model, x, range(self.ngpu)).mean() / self.accum_grad
File "/espnet/tools/venv/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 185, in data_parallel
return module(*inputs[0], **module_kwargs[0])
File "/espnet/tools/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/espnet/espnet/nets/pytorch_backend/e2e_asr.py", line 363, in forward
self.loss_ctc = self.ctc(hs_pad, hlens, ys_pad)
File "/espnet/tools/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/espnet/espnet/nets/pytorch_backend/ctc.py", line 112, in forward
self.loss = to_device(self, self.loss_fn(ys_hat, ys_true, hlens, olens)).to(
File "/espnet/espnet/nets/pytorch_backend/ctc.py", line 62, in loss_fn
return self.ctc_loss(th_pred, th_target, th_ilen, th_olen)
File "/espnet/tools/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/espnet/tools/venv/lib/python3.7/site-packages/warpctc_pytorch/__init__.py", line 93, in forward
self.length_average, self.blank, self.reduce)
File "/espnet/tools/venv/lib/python3.7/site-packages/warpctc_pytorch/__init__.py", line 23, in forward
loss_func = warp_ctc.gpu_ctc if is_cuda else warp_ctc.cpu_ctc
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
File "/espnet/egs/csj/asr1/../../../espnet/bin/asr_train.py", line 628, in <module>
main(sys.argv[1:])
File "/espnet/egs/csj/asr1/../../../espnet/bin/asr_train.py", line 614, in main
train(args)
File "/espnet/espnet/asr/pytorch_backend/asr.py", line 830, in train
trainer.run()
File "/espnet/tools/venv/lib/python3.7/site-packages/chainer/training/trainer.py", line 349, in run
six.reraise(*exc_info)
File "/espnet/tools/venv/lib/python3.7/site-packages/six.py", line 703, in reraise
raise value
File "/espnet/tools/venv/lib/python3.7/site-packages/chainer/training/trainer.py", line 316, in run
update()
File "/espnet/espnet/asr/pytorch_backend/asr.py", line 242, in update
self.update_core()
File "/espnet/espnet/asr/pytorch_backend/asr.py", line 206, in update_core
data_parallel(self.model, x, range(self.ngpu)).mean() / self.accum_grad
File "/espnet/tools/venv/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 185, in data_parallel
return module(*inputs[0], **module_kwargs[0])
File "/espnet/tools/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/espnet/espnet/nets/pytorch_backend/e2e_asr.py", line 363, in forward
self.loss_ctc = self.ctc(hs_pad, hlens, ys_pad)
File "/espnet/tools/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/espnet/espnet/nets/pytorch_backend/ctc.py", line 112, in forward
self.loss = to_device(self, self.loss_fn(ys_hat, ys_true, hlens, olens)).to(
File "/espnet/espnet/nets/pytorch_backend/ctc.py", line 62, in loss_fn
return self.ctc_loss(th_pred, th_target, th_ilen, th_olen)
File "/espnet/tools/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/espnet/tools/venv/lib/python3.7/site-packages/warpctc_pytorch/__init__.py", line 93, in forward
self.length_average, self.blank, self.reduce)
File "/espnet/tools/venv/lib/python3.7/site-packages/warpctc_pytorch/__init__.py", line 23, in forward
loss_func = warp_ctc.gpu_ctc if is_cuda else warp_ctc.cpu_ctc
AttributeError: module 'warpctc_pytorch' has no attribute 'gpu_ctc'
# Accounting: time=27 threads=1
# Ended (code 1) at Mon Jul 20 06:07:38 UTC 2020, elapsed time 27 seconds
要するにwarpctc_pytorch周りでエラーが起きている
解決法
-
ESPNetはvenv環境内で実行しているため、まずvenv環境に入る。
source /espnet/tools/venv/bin/activate
-
warp ctcの再インストール
git clone https://github.com/espnet/warp-ctc -b pytorch-1.1
cd warp-ctc && mkdir build && cd build && cmake .. && make -j
cd ../pytorch_binding && python(もしくはpython3) setup.py install
- 以下のエラー出た場合は、3-Aを実行した後、
python setup.py install
を再実行
running install
running bdist_egg
running egg_info
creating warpctc_pytorch10_cuda100.egg-info
writing warpctc_pytorch10_cuda100.egg-info/PKG-INFO
writing dependency_links to warpctc_pytorch10_cuda100.egg-info/dependency_links.txt
writing top-level names to warpctc_pytorch10_cuda100.egg-info/top_level.txt
writing manifest file 'warpctc_pytorch10_cuda100.egg-info/SOURCES.txt'
reading manifest file 'warpctc_pytorch10_cuda100.egg-info/SOURCES.txt'
writing manifest file 'warpctc_pytorch10_cuda100.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/warpctc_pytorch
copying warpctc_pytorch/__init__.py -> build/lib.linux-x86_64-3.7/warpctc_pytorch
creating build/lib.linux-x86_64-3.7/warpctc_pytorch/lib
copying warpctc_pytorch/lib/libwarpctc.so -> build/lib.linux-x86_64-3.7/warpctc_pytorch/lib
running build_ext
building 'warpctc_pytorch._warp_ctc' extension
creating build/temp.linux-x86_64-3.7
creating build/temp.linux-x86_64-3.7/src
gcc -pthread -B /espnet/tools/venv/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/warp-ctc/include -I/espnet/tools/venv/lib/python3.7/site-packages/torch/lib/include -I/espnet/tools/venv/lib/python3.7/site-packages/torch/lib/include/torch/csrc/api/include -I/espnet/tools/venv/lib/python3.7/site-packages/torch/lib/include/TH -I/espnet/tools/venv/lib/python3.7/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/espnet/tools/venv/include/python3.7m -c src/binding.cpp -o build/temp.linux-x86_64-3.7/src/binding.o -std=c++11 -fPIC -DWARPCTC_ENABLE_GPU -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_warp_ctc -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
src/binding.cpp:11:11: fatal error: c10/cuda/CUDAGuard.h: No such file or directory
#include "c10/cuda/CUDAGuard.h"
^~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
error: command 'gcc' failed with exit status 1
- 以下のエラーが出る場合は、3-Bを実行した後、
python setup.py install
を再実行
running install
running bdist_egg
running egg_info
writing warpctc_pytorch12_cuda100.egg-info/PKG-INFO
writing dependency_links to warpctc_pytorch12_cuda100.egg-info/dependency_links.txt
writing top-level names to warpctc_pytorch12_cuda100.egg-info/top_level.txt
reading manifest file 'warpctc_pytorch12_cuda100.egg-info/SOURCES.txt'
writing manifest file 'warpctc_pytorch12_cuda100.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
copying warpctc_pytorch/lib/libwarpctc.so -> build/lib.linux-x86_64-3.6/warpctc_pytorch/lib
running build_ext
building 'warpctc_pytorch._warp_ctc' extension
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/warp-ctc/include -I/usr/local/lib/python3.6/dist-packages/torch/include -I/usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.6/dist-packages/torch/include/TH -I/usr/local/lib/python3.6/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.6m -c src/binding.cpp -o build/temp.linux-x86_64-3.6/src/binding.o -std=c++11 -fPIC -DWARPCTC_ENABLE_GPU -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_warp_ctc -D_GLIBCXX_USE_CXX11_ABI=0
In file included from /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/Device.h:3:0,
from /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include/torch/python.h:8,
from /usr/local/lib/python3.6/dist-packages/torch/include/torch/extension.h:6,
from src/binding.cpp:6:
/usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/python_headers.h:9:10: fatal error: Python.h: No such file or directory
#include <Python.h>
^~~~~~~~~~
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
3-A. 自分のCUDAバージョンにあったPytorchのインストール
pip install torch===1.2.0 torchvision===0.4.0 -f https://download.pytorch.org/whl/torch_stable.html
3-B. python3.7-devのインストール
apt-get install software-properties-common
add-apt-repository ppa:deadsnakes/ppa
apt-get update
apt-get install python3.7(自分のpythonのバージョンに合わせる)-dev