NLU (Natural Language Understanding=自然言語理解) をやってみたくて、
を参考にしながら、dockerで環境を作って、実際に動くところまでやってみる。
Docker環境構築
ホストマシン上でrasa用のディレクトリを作る。
$ mkdir rasa
$ cd rasa
$ vi Dockerfile
Dockerfile に、まずは python が動くimageのみを記述してみる。
FROM python:3.7-slim-stretch
docker-composeで動かしたいので docker-compose.yml を作成
$ vi docker-compose.yml
version: '2'
services:
rasa:
container_name: rasa
build:
context: .
volumes:
- .:/app
buildする
$ docker-compose build
Building rasa
Step 1/1 : FROM python:3.7-slim-stretch
---> c9ec5ac0f580
Successfully built c9ec5ac0f580
Successfully tagged rasa-test_rasa:latest
$
python動くか確認。
$ docker-compose run --rm rasa bash
Creating network "rasa-test_default" with the default driver
root@c02085da1f5f:/# python --version
Python 3.7.7
root@c02085da1f5f:/#
動いた。
Rasaをインストール
を参考にrasa環境を作っていく
root@c02085da1f5f:/# pip install rasa
...(なんかいろいろインストールされる)
Successfully built sanic-jwt absl-py mattermostwrapper colorclass webexteamssdk SQLAlchemy terminaltables future gast termcolor wrapt PyYAML docopt pyrsistent
Failed to build ujson
Installing collected packages: ujson, certifi, six, pycparser, cffi, cryptography, future, python-telegram-bot, tqdm, tabulate, python-crfsuite, sklearn-crfsuite, dnspython, pymongo, ruamel.yaml, numpy, h5py, keras-applications, opt-einsum, grpcio, protobuf, absl-py, google-pasta, gast, termcolor, scipy, wrapt, keras-preprocessing, werkzeug, cachetools, pyasn1, pyasn1-modules, rsa, google-auth, chardet, urllib3, idna, requests, markdown, oauthlib, requests-oauthlib, google-auth-oauthlib, tensorboard, tensorflow-estimator, astor, tensorflow, python-dateutil, PyYAML, docopt, pykwalify, zipp, importlib-metadata, pyrsistent, attrs, jsonschema, greenlet, gevent, redis, PyJWT, sanic-jwt, cloudpickle, kafka-python, rocketchat-API, joblib, scikit-learn, pysocks, pytz, twilio, cycler, pyparsing, kiwisolver, matplotlib, mattermostwrapper, multidict, psycopg2-binary, colorclass, aiofiles, rfc3986, hstspreload, hpack, hyperframe, h2, h11, sniffio, httpx, httptools, uvloop, websockets, sanic, requests-toolbelt, webexteamssdk, httplib2, oauth2client, python-engineio, tzlocal, apscheduler, async-generator, wcwidth, prompt-toolkit, questionary, fbmessenger, sanic-plugins-framework, sanic-cors, humanfriendly, coloredlogs, colorhash, jsonpickle, SQLAlchemy, async-timeout, yarl, aiohttp, pydot, packaging, rasa-sdk, python-socketio, tensorflow-hub, decorator, tensorflow-probability, typeguard, tensorflow-addons, terminaltables, networkx, jmespath, docutils, botocore, s3transfer, boto3, slackclient, pika, rasa
Running setup.py install for ujson ... error
ERROR: Command errored out with exit status 1:
command: /usr/local/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-ingvl92n/ujson/setup.py'"'"'; __file__='"'"'/tmp/pip-install-ingvl92n/ujson/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-46isinhm/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.7m/ujson
cwd: /tmp/pip-install-ingvl92n/ujson/
Complete output (12 lines):
Warning: 'classifiers' should be a list, got type 'filter'
running install
running build
running build_ext
building 'ujson' extension
creating build
creating build/temp.linux-x86_64-3.7
creating build/temp.linux-x86_64-3.7/python
creating build/temp.linux-x86_64-3.7/lib
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I./python -I./lib -I/usr/local/include/python3.7m -c ./python/ujson.c -o build/temp.linux-x86_64-3.7/./python/ujson.o -D_GNU_SOURCE
unable to execute 'gcc': No such file or directory
error: command 'gcc' failed with exit status 1
----------------------------------------
ERROR: Command errored out with exit status 1: /usr/local/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-ingvl92n/ujson/setup.py'"'"'; __file__='"'"'/tmp/pip-install-ingvl92n/ujson/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-46isinhm/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.7m/ujson Check the logs for full command output.
root@c02085da1f5f:/#
なんかエラー出た。
unable to execute 'gcc': No such file or directory
ってことなので、それっぽいのをインストールする
root@c02085da1f5f:/# apt-get update -q -y && apt-get install -q -y build-essential
...(なんかいろいろインストールされる)
update-alternatives: using /usr/bin/g++ to provide /usr/bin/c++ (c++) in auto mode
Setting up build-essential (12.3) ...
Processing triggers for libc-bin (2.24-11+deb9u4) ...
root@c02085da1f5f:/#
もっかいrasaのインストールにチャレンジ
root@c02085da1f5f:/# pip install rasa
...(なんかいろいろインストールされる)
Successfully built ujson
Installing collected packages: six, absl-py, cycler, python-dateutil, kiwisolver, numpy, pyparsing, matplotlib, docopt, PyYAML, pykwalify, protobuf, tensorflow-hub, kafka-python, pytz, multidict, pysocks, urllib3, certifi, idna, chardet, requests, PyJWT, twilio, fbmessenger, ruamel.yaml, yarl, attrs, async-timeout, aiohttp, slackclient, terminaltables, aiofiles, httptools, websockets, ujson, h11, hpack, hyperframe, h2, sniffio, rfc3986, hstspreload, httpx, uvloop, sanic, sanic-plugins-framework, sanic-cors, humanfriendly, coloredlogs, rasa-sdk, tzlocal, apscheduler, packaging, typeguard, tensorflow-addons, greenlet, gevent, pika, scipy, google-pasta, astor, termcolor, wrapt, grpcio, pyasn1, pyasn1-modules, rsa, cachetools, google-auth, werkzeug, markdown, oauthlib, requests-oauthlib, google-auth-oauthlib, tensorboard, h5py, keras-applications, tensorflow-estimator, keras-preprocessing, opt-einsum, gast, tensorflow, cloudpickle, decorator, tensorflow-probability, dnspython, pymongo, python-engineio, joblib, scikit-learn, httplib2, oauth2client, SQLAlchemy, requests-toolbelt, future, webexteamssdk, sanic-jwt, wcwidth, prompt-toolkit, colorhash, colorclass, tqdm, rocketchat-API, jmespath, docutils, botocore, s3transfer, boto3, questionary, zipp, importlib-metadata, pyrsistent, jsonschema, mattermostwrapper, jsonpickle, python-socketio, redis, pydot, networkx, pycparser, cffi, cryptography, python-telegram-bot, psycopg2-binary, async-generator, python-crfsuite, tabulate, sklearn-crfsuite, rasa
Successfully installed PyJWT-1.7.1 PyYAML-5.3.1 SQLAlchemy-1.3.15 absl-py-0.9.0 aiofiles-0.4.0 aiohttp-3.6.2 apscheduler-3.6.3 astor-0.8.1 async-generator-1.10 async-timeout-3.0.1 attrs-19.3.0 boto3-1.12.30 botocore-1.15.30 cachetools-4.0.0 certifi-2019.11.28 cffi-1.14.0 chardet-3.0.4 cloudpickle-1.2.2 colorclass-2.2.0 coloredlogs-10.0 colorhash-1.0.2 cryptography-2.8 cycler-0.10.0 decorator-4.4.2 dnspython-1.16.0 docopt-0.6.2 docutils-0.15.2 fbmessenger-6.0.0 future-0.18.2 gast-0.2.2 gevent-1.4.0 google-auth-1.12.0 google-auth-oauthlib-0.4.1 google-pasta-0.2.0 greenlet-0.4.15 grpcio-1.27.2 h11-0.8.1 h2-3.2.0 h5py-2.10.0 hpack-3.0.0 hstspreload-2020.3.25 httplib2-0.17.0 httptools-0.1.1 httpx-0.9.3 humanfriendly-8.1 hyperframe-5.2.0 idna-2.9 importlib-metadata-1.5.2 jmespath-0.9.5 joblib-0.14.1 jsonpickle-1.3 jsonschema-3.2.0 kafka-python-1.4.7 keras-applications-1.0.8 keras-preprocessing-1.1.0 kiwisolver-1.1.0 markdown-3.2.1 matplotlib-3.1.3 mattermostwrapper-2.2 multidict-4.7.5 networkx-2.4 numpy-1.18.2 oauth2client-4.1.3 oauthlib-3.1.0 opt-einsum-3.2.0 packaging-19.0 pika-1.1.0 prompt-toolkit-2.0.10 protobuf-3.11.3 psycopg2-binary-2.8.4 pyasn1-0.4.8 pyasn1-modules-0.2.8 pycparser-2.20 pydot-1.4.1 pykwalify-1.7.0 pymongo-3.8.0 pyparsing-2.4.6 pyrsistent-0.16.0 pysocks-1.7.1 python-crfsuite-0.9.7 python-dateutil-2.8.1 python-engineio-3.11.2 python-socketio-4.4.0 python-telegram-bot-11.1.0 pytz-2019.3 questionary-1.5.1 rasa-1.9.2 rasa-sdk-1.9.0 redis-3.4.1 requests-2.23.0 requests-oauthlib-1.3.0 requests-toolbelt-0.9.1 rfc3986-1.3.2 rocketchat-API-0.6.36 rsa-4.0 ruamel.yaml-0.15.100 s3transfer-0.3.3 sanic-19.12.2 sanic-cors-0.10.0.post3 sanic-jwt-1.3.2 sanic-plugins-framework-0.9.2 scikit-learn-0.22.2.post1 scipy-1.4.1 six-1.14.0 sklearn-crfsuite-0.3.6 slackclient-2.5.0 sniffio-1.1.0 tabulate-0.8.7 tensorboard-2.1.1 tensorflow-2.1.0 tensorflow-addons-0.8.3 tensorflow-estimator-2.1.0 tensorflow-hub-0.7.0 tensorflow-probability-0.7.0 termcolor-1.1.0 terminaltables-3.1.0 tqdm-4.31.1 twilio-6.26.3 typeguard-2.7.1 tzlocal-2.0.0 ujson-1.35 urllib3-1.25.8 uvloop-0.14.0 wcwidth-0.1.9 webexteamssdk-1.1.1 websockets-8.1 werkzeug-1.0.0 wrapt-1.12.1 yarl-1.4.2 zipp-3.1.0
root@c02085da1f5f:/#
インストールできた。
次はrasaのinit。
の前に、作業ディレクトリを移動する。/ 直下でなんかファイルができると気持ち悪いので。
root@c02085da1f5f:/# pwd
/
root@c02085da1f5f:/# ls -al
total 88
drwxr-xr-x 1 root root 4096 Mar 27 08:25 .
drwxr-xr-x 1 root root 4096 Mar 27 08:25 ..
-rwxr-xr-x 1 root root 0 Mar 27 08:25 .dockerenv
drwxr-xr-x 4 root root 128 Mar 27 08:22 app
drwxr-xr-x 1 root root 4096 Mar 27 08:33 bin
drwxr-xr-x 2 root root 4096 Feb 1 17:09 boot
drwxr-xr-x 5 root root 360 Mar 27 08:25 dev
drwxr-xr-x 1 root root 4096 Mar 27 08:33 etc
drwxr-xr-x 2 root root 4096 Feb 1 17:09 home
drwxr-xr-x 1 root root 4096 Mar 27 08:33 lib
drwxr-xr-x 2 root root 4096 Feb 24 00:00 lib64
drwxr-xr-x 2 root root 4096 Feb 24 00:00 media
drwxr-xr-x 2 root root 4096 Feb 24 00:00 mnt
drwxr-xr-x 2 root root 4096 Feb 24 00:00 opt
dr-xr-xr-x 228 root root 0 Mar 27 08:25 proc
drwx------ 1 root root 4096 Mar 27 08:27 root
drwxr-xr-x 3 root root 4096 Feb 24 00:00 run
drwxr-xr-x 2 root root 4096 Feb 24 00:00 sbin
drwxr-xr-x 2 root root 4096 Feb 24 00:00 srv
dr-xr-xr-x 13 root root 0 Mar 27 08:25 sys
drwxrwxrwt 1 root root 4096 Mar 27 08:36 tmp
drwxr-xr-x 1 root root 4096 Feb 24 00:00 usr
drwxr-xr-x 1 root root 4096 Feb 24 00:00 var
root@c02085da1f5f:/# cd app/
root@c02085da1f5f:/app# ls -al
total 12
drwxr-xr-x 4 root root 128 Mar 27 08:22 .
drwxr-xr-x 1 root root 4096 Mar 27 08:25 ..
-rw-r--r-- 1 root root 29 Mar 27 08:20 Dockerfile
-rw-r--r-- 1 root root 112 Mar 27 08:22 docker-compose.yml
root@c02085da1f5f:/app#
/app で作業する。
rasa init する
root@c02085da1f5f:/app# rasa init --no-prompt
Welcome to Rasa! 🤖
To get started quickly, an initial project will be created.
If you need some help, check out the documentation at https://rasa.com/docs/rasa.
Created project directory at '/app'.
Finished creating project structure.
Training an initial model...
Training Core model...
(なんやかんや)
2020-03-27 08:46:25 INFO rasa.nlu.model - Successfully saved model into '/tmp/tmp4p2k58s6/nlu'
NLU model training completed.
Your Rasa model is trained and saved at '/app/models/20200327-084527.tar.gz'.
If you want to speak to the assistant, run 'rasa shell' at any time inside the project directory.
root@c02085da1f5f:/app#
init できたっぽい。
ディレクトリ内を見てみる。
root@c02085da1f5f:/app# ls -al
total 32
drwxr-xr-x 14 root root 448 Mar 27 08:46 .
drwxr-xr-x 1 root root 4096 Mar 27 08:25 ..
-rw-r--r-- 1 root root 29 Mar 27 08:20 Dockerfile
-rw-r--r-- 1 root root 0 Mar 27 08:36 __init__.py
drwxr-xr-x 4 root root 128 Mar 27 08:45 __pycache__
-rw-r--r-- 1 root root 757 Mar 27 08:36 actions.py
-rw-r--r-- 1 root root 622 Mar 27 08:36 config.yml
-rw-r--r-- 1 root root 938 Mar 27 08:36 credentials.yml
drwxr-xr-x 4 root root 128 Mar 27 08:45 data
-rw-r--r-- 1 root root 112 Mar 27 08:22 docker-compose.yml
-rw-r--r-- 1 root root 549 Mar 27 08:36 domain.yml
-rw-r--r-- 1 root root 1456 Mar 27 08:36 endpoints.yml
drwxr-xr-x 3 root root 96 Mar 27 08:46 models
drwxr-xr-x 3 root root 96 Mar 27 08:45 tests
root@c02085da1f5f:/app#
なんか色々増えてる。
参考にした記事と同様に、まずは会話してみる。
root@c02085da1f5f:/app# rasa shell
2020-03-27 08:54:07 INFO root - Connecting to channel 'cmdline' which was specified by the '--connector' argument. Any other channels will be ignored. To connect to all given channels, omit the '--connector' argument.
2020-03-27 08:54:07 INFO root - Starting Rasa server on http://localhost:5005
2020-03-27 08:54:08.593814: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
Bot loaded. Type a message and press enter (use '/stop' to exit):
Your input -> hello
Hey! How are you?
Your input -> /stop
2020-03-27 08:54:31 INFO root - Killing Sanic server now.
root@c02085da1f5f:/app#
おお。できてる。
日本語に対応する
次にこの記事に戻って日本語対応してみる。
まずは data/nlu.md
の一番下にデータを追加してみる。
$ vi data/nlu.md
## intent:restaurant_ja
- 渋谷で美味しいイタリアンない?
- 和食食べたいんだけど、六本木におすすめある?
- 今度麻布行くんだけど、フレンチのお店教えて
上記記事だとmecabをやめてspaCyを使っているようなので、mecabを飛ばしてspaCyでやってみる。
config.yml にデフォルト設定があるようなので、一旦全部消して以下を書いてみる。
pipeline:
# - name: “tokenizer_whitespace”
- name: “nlp_spacy”
- name: “tokenizer_spacy”
- name: “CRFEntityExtractor”
- name: “ner_crf”
- name: “ner_synonyms”
- name: “intent_featurizer_count_vectors”
- name: “intent_classifier_tensorflow_embedding”
学習させてみる。
root@c02085da1f5f:/app# rasa train nlu
The config file 'config.yml' is missing mandatory parameters: 'language'. Add missing parameters to config file and try again.
root@c02085da1f5f:/app#
config.yml に languageが無いよって。
上記記事だとja_ginzaにしているようなので、端折って config.yml の一番上に ja_ginza を追加してみる。
language: ja_ginza
もういっかいコマンドを叩いてみる。
root@c02085da1f5f:/app# rasa train nlu
Training NLU model...
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/rasa/nlu/registry.py", line 173, in get_component_class
return class_from_module_path(component_name)
File "/usr/local/lib/python3.7/site-packages/rasa/utils/common.py", line 211, in class_from_module_path
raise ImportError(f"Cannot retrieve class from path {module_path}.")
ImportError: Cannot retrieve class from path “nlp_spacy”.
(なんやかんや)
nlp_spacyが云々で怒られている。
spaCy入れてないもんね。
spaCy入れてみる。
root@c02085da1f5f:/app# pip install spacy
(なんやかんや)
Successfully installed blis-0.4.1 catalogue-1.0.0 cymem-2.0.3 murmurhash-1.0.2 plac-1.1.3 preshed-3.0.2 spacy-2.2.4 srsly-1.0.2 thinc-7.4.0 tqdm-4.43.0 wasabi-0.6.0
root@c02085da1f5f:/app#
入った。
GiNZAの日本語処理ライブラリ?があるようなので、インストールする。
root@c02085da1f5f:/app# pip install "https://github.com/megagonlabs/ginza/releases/download/latest/ginza-latest.tar.gz"
Collecting https://github.com/megagonlabs/ginza/releases/download/latest/ginza-latest.tar.gz
(なんやかんや)
Successfully built ginza ja-ginza SudachiDict-core
Installing collected packages: ja-ginza, sortedcontainers, Cython, dartsclone, SudachiPy, SudachiDict-core, ginza
Successfully installed Cython-0.29.16 SudachiDict-core-20190927 SudachiPy-0.4.3 dartsclone-0.9.0 ginza-2.2.1 ja-ginza-2.2.0 sortedcontainers-2.1.0
root@c02085da1f5f:/app#
インストールできた。
これで準備できたか?
もう一回コマンドを叩く。
root@c02085da1f5f:/app# rasa train nlu
Training NLU model...
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/rasa/nlu/registry.py", line 173, in get_component_class
return class_from_module_path(component_name)
File "/usr/local/lib/python3.7/site-packages/rasa/utils/common.py", line 211, in class_from_module_path
raise ImportError(f"Cannot retrieve class from path {module_path}.")
ImportError: Cannot retrieve class from path “nlp_spacy”.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/rasa", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.7/site-packages/rasa/__main__.py", line 91, in main
cmdline_arguments.func(cmdline_arguments)
File "/usr/local/lib/python3.7/site-packages/rasa/cli/train.py", line 140, in train_nlu
persist_nlu_training_data=args.persist_nlu_data,
File "/usr/local/lib/python3.7/site-packages/rasa/train.py", line 414, in train_nlu
persist_nlu_training_data,
File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.7/site-packages/rasa/train.py", line 445, in _train_nlu_async
persist_nlu_training_data=persist_nlu_training_data,
File "/usr/local/lib/python3.7/site-packages/rasa/train.py", line 474, in _train_nlu_with_validated_data
persist_nlu_training_data=persist_nlu_training_data,
File "/usr/local/lib/python3.7/site-packages/rasa/nlu/train.py", line 74, in train
trainer = Trainer(nlu_config, component_builder)
File "/usr/local/lib/python3.7/site-packages/rasa/nlu/model.py", line 142, in __init__
components.validate_requirements(cfg.component_names)
File "/usr/local/lib/python3.7/site-packages/rasa/nlu/components.py", line 51, in validate_requirements
component_class = registry.get_component_class(component_name)
File "/usr/local/lib/python3.7/site-packages/rasa/nlu/registry.py", line 199, in get_component_class
raise ModuleNotFoundError(exception_message)
ModuleNotFoundError: Cannot find class '“nlp_spacy”' from global namespace. Please check that there is no typo in the class name and that you have imported the class into the global namespace.
root@c02085da1f5f:/app#
なんか怒られた。
Cannot find class '“nlp_spacy”'
ってなんだろ。
ModuleNotFoundError: Cannot find class '“nlp_spacy”' from global namespace. Please check that there is no typo in the class name and that you have imported the class into the global namespace.
でググってみる。
このページが一番上に出てきた。
old_style_names
?
"nlp_spacy": "SpacyNLP",
こんな感じの対応付けが定義されている。
さっきconfig.ymlに書いたのが、nlp_spacy
だから SpacyNLP
に変更してみる。
language: ja_ginza
pipeline:
# - name: “tokenizer_whitespace”
- name: "SpacyNLP" # ←ここを変更
- name: "tokenizer_spacy"
- name: "CRFEntityExtractor"
- name: "ner_crf"
- name: "ner_synonyms"
- name: "intent_featurizer_count_vectors"
- name: "intent_classifier_tensorflow_embedding"
再度コマンドを叩く。
root@c02085da1f5f:/app# rasa train nlu
Training NLU model...
(なんやかんや)
ModuleNotFoundError: Cannot find class '“SpacyNLP”' from global namespace. Please check that there is no typo in the class name and that you have imported the class into the global namespace.
“SpacyNLP”
??
あ、コピペしたらダブルクオートがおかしいのか。
ダブルクオートを治す。
language: ja_ginza
pipeline:
# - name: “tokenizer_whitespace”
- name: "nlp_spacy"
- name: "tokenizer_spacy"
- name: "CRFEntityExtractor"
- name: "ner_crf"
- name: "ner_synonyms"
- name: "intent_featurizer_count_vectors"
- name: "intent_classifier_tensorflow_embedding"
もういっかいコマンド叩く。
root@c02085da1f5f:/app# rasa train nlu
Training NLU model...
(なんやかんや)
oblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
warnings.warn(msg, category=FutureWarning)
2020-03-27 09:15:20 INFO rasa.nlu.model - Successfully saved model into '/tmp/tmpn4xtb3h7/nlu'
NLU model training completed.
Your Rasa model is trained and saved at '/app/models/nlu-20200327-091520.tar.gz'.
root@c02085da1f5f:/app#
お、やっと動いた。
rasa shell nlu
で intent が取れれば成功?
root@c02085da1f5f:/app# rasa shell nlu
2020-03-27 09:16:26 INFO rasa.nlu.components - Added 'SpacyNLP' to component cache. Key 'SpacyNLP-ja_ginza'.
/usr/local/lib/python3.7/site-packages/sklearn/externals/joblib/__init__.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
warnings.warn(msg, category=FutureWarning)
2020-03-27 09:16:26.359929: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
/usr/local/lib/python3.7/site-packages/rasa/nlu/classifiers/diet_classifier.py:864: FutureWarning: 'EmbeddingIntentClassifier' is deprecated and will be removed in version 2.0. Use 'DIETClassifier' instead.
model=model,
NLU model loaded. Type a message and press enter to parse it.
Next message:
六本木のイタリアン教えて
{
"intent": {
"name": "restaurant_ja",
"confidence": 0.48260971903800964
},
"entities": [],
"intent_ranking": [
{
"name": "restaurant_ja",
"confidence": 0.48260971903800964
},
{
"name": "affirm",
"confidence": 0.1126396581530571
},
{
"name": "greet",
"confidence": 0.09804855287075043
},
{
"name": "goodbye",
"confidence": 0.08541827648878098
},
{
"name": "mood_great",
"confidence": 0.07586738467216492
},
{
"name": "deny",
"confidence": 0.06578702479600906
},
{
"name": "bot_challenge",
"confidence": 0.05436018109321594
},
{
"name": "mood_unhappy",
"confidence": 0.025269268080592155
}
],
"text": "六本木のイタリアン教えて"
}
Next message:
お、それっぽい。intentのスコア低いけど、こんなもん?
固有表現もやってみよう。
data/nlu.md
の日本語の箇所を下記のように書き換えて、
## intent:restaurant_ja
- [渋谷](location)で美味しい[イタリアン](restaurant_type)ない?
- [和食](restaurant_type)食べたいんだけど、[六本木](location)におすすめある?
- 今度[麻布](location)行くんだけど、[フレンチ](restaurant_type)のお店教えて
学習させて、
root@c02085da1f5f:/app# rasa train nlu
Training NLU model...
(なんやかんや)
2020-03-27 13:46:37 INFO rasa.nlu.model - Finished training component.
/usr/local/lib/python3.7/site-packages/sklearn/externals/joblib/__init__.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
warnings.warn(msg, category=FutureWarning)
2020-03-27 13:46:37 INFO rasa.nlu.model - Successfully saved model into '/tmp/tmpy7tqyrsb/nlu'
NLU model training completed.
Your Rasa model is trained and saved at '/app/models/nlu-20200327-134637.tar.gz'.
root@c02085da1f5f:/app#
shell を叩いてみる。
root@c02085da1f5f:/app# rasa shell nlu
2020-03-27 13:47:07 INFO rasa.nlu.components - Added 'SpacyNLP' to component cache. Key 'SpacyNLP-ja_ginza'.
/usr/local/lib/python3.7/site-packages/sklearn/externals/joblib/__init__.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
warnings.warn(msg, category=FutureWarning)
2020-03-27 13:47:07.924760: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
/usr/local/lib/python3.7/site-packages/rasa/nlu/classifiers/diet_classifier.py:864: FutureWarning: 'EmbeddingIntentClassifier' is deprecated and will be removed in version 2.0. Use 'DIETClassifier' instead.
model=model,
NLU model loaded. Type a message and press enter to parse it.
Next message:
渋谷で美味しいイタリアンない?
{
"intent": {
"name": "restaurant_ja",
"confidence": 0.9823814034461975
},
"entities": [
{
"start": 0,
"end": 2,
"value": "渋谷",
"entity": "location",
"confidence": 0.7429793877231764,
"extractor": "CRFEntityExtractor"
},
{
"start": 7,
"end": 12,
"value": "イタリアン",
"entity": "restaurant_type",
"confidence": 0.7588623133724827,
"extractor": "CRFEntityExtractor"
},
{
"start": 0,
"end": 2,
"value": "渋谷",
"entity": "location",
"confidence": 0.7429793877231764,
"extractor": "CRFEntityExtractor"
},
{
"start": 7,
"end": 12,
"value": "イタリアン",
"entity": "restaurant_type",
"confidence": 0.7588623133724827,
"extractor": "CRFEntityExtractor"
}
],
"intent_ranking": [
{
"name": "restaurant_ja",
"confidence": 0.9823814034461975
},
{
"name": "mood_great",
"confidence": 0.008284788578748703
},
{
"name": "deny",
"confidence": 0.006644146051257849
},
{
"name": "greet",
"confidence": 0.001571052591316402
},
{
"name": "bot_challenge",
"confidence": 0.0006475948612205684
},
{
"name": "affirm",
"confidence": 0.00034828390926122665
},
{
"name": "mood_unhappy",
"confidence": 8.46567636472173e-05
},
{
"name": "goodbye",
"confidence": 3.805106462095864e-05
}
],
"text": "渋谷で美味しいイタリアンない?"
}
Next message:
おお。記事みたいな出力になった。
記事とは違って日本語は文字化けしてないけど。
環境ができて動くところまで見れたので、今回はここまで。
今後は教師データを増やす方法を考える。
追記
ここまでの成果として、pip で install したライブラリを requirements.txt にして、build したら勝手にインストールされるように Dockerfile に処理を追記しておいた。
rasa==1.9.2
spacy==2.2.4
FROM python:3.7-slim-stretch
RUN apt-get update -q -y && \
apt-get install -q -y \
build-essential
ADD ./requirements.txt requirements.txt
RUN pip install -r requirements.txt
RUN pip install "https://github.com/megagonlabs/ginza/releases/download/latest/ginza-latest.tar.gz"
ADD . /app
WORKDIR /app
んで、build して
$ docker-compose build
動かす
$ docker-compose run --rm rasa bash
root@56ff2c2f2f38:/app#
追記2
github で公開した
追記3
ginzaのインストールはpip経由で(3.0から)できるようになったようなので、requirements.txtにginzaを追記してDockerfileから該当箇所を削除した。