LoginSignup
4
5

More than 3 years have passed since last update.

Rasa NLU で自然言語理解のおためし

Last updated at Posted at 2020-03-27

NLU (Natural Language Understanding=自然言語理解) をやってみたくて、

を参考にしながら、dockerで環境を作って、実際に動くところまでやってみる。

Docker環境構築

ホストマシン上でrasa用のディレクトリを作る。

$ mkdir rasa
$ cd rasa
$ vi Dockerfile

Dockerfile に、まずは python が動くimageのみを記述してみる。

FROM python:3.7-slim-stretch

docker-composeで動かしたいので docker-compose.yml を作成

$ vi docker-compose.yml
docker-compose.yml
version: '2'
services:
  rasa:
    container_name: rasa
    build:
      context: .
    volumes:
      - .:/app

buildする

$ docker-compose build 
Building rasa
Step 1/1 : FROM python:3.7-slim-stretch
 ---> c9ec5ac0f580
Successfully built c9ec5ac0f580
Successfully tagged rasa-test_rasa:latest

$

python動くか確認。

$ docker-compose run --rm rasa bash
Creating network "rasa-test_default" with the default driver
root@c02085da1f5f:/# python --version
Python 3.7.7
root@c02085da1f5f:/#

動いた。

Rasaをインストール

を参考にrasa環境を作っていく

root@c02085da1f5f:/# pip install rasa
...(なんかいろいろインストールされる)

Successfully built sanic-jwt absl-py mattermostwrapper colorclass webexteamssdk SQLAlchemy terminaltables future gast termcolor wrapt PyYAML docopt pyrsistent
Failed to build ujson
Installing collected packages: ujson, certifi, six, pycparser, cffi, cryptography, future, python-telegram-bot, tqdm, tabulate, python-crfsuite, sklearn-crfsuite, dnspython, pymongo, ruamel.yaml, numpy, h5py, keras-applications, opt-einsum, grpcio, protobuf, absl-py, google-pasta, gast, termcolor, scipy, wrapt, keras-preprocessing, werkzeug, cachetools, pyasn1, pyasn1-modules, rsa, google-auth, chardet, urllib3, idna, requests, markdown, oauthlib, requests-oauthlib, google-auth-oauthlib, tensorboard, tensorflow-estimator, astor, tensorflow, python-dateutil, PyYAML, docopt, pykwalify, zipp, importlib-metadata, pyrsistent, attrs, jsonschema, greenlet, gevent, redis, PyJWT, sanic-jwt, cloudpickle, kafka-python, rocketchat-API, joblib, scikit-learn, pysocks, pytz, twilio, cycler, pyparsing, kiwisolver, matplotlib, mattermostwrapper, multidict, psycopg2-binary, colorclass, aiofiles, rfc3986, hstspreload, hpack, hyperframe, h2, h11, sniffio, httpx, httptools, uvloop, websockets, sanic, requests-toolbelt, webexteamssdk, httplib2, oauth2client, python-engineio, tzlocal, apscheduler, async-generator, wcwidth, prompt-toolkit, questionary, fbmessenger, sanic-plugins-framework, sanic-cors, humanfriendly, coloredlogs, colorhash, jsonpickle, SQLAlchemy, async-timeout, yarl, aiohttp, pydot, packaging, rasa-sdk, python-socketio, tensorflow-hub, decorator, tensorflow-probability, typeguard, tensorflow-addons, terminaltables, networkx, jmespath, docutils, botocore, s3transfer, boto3, slackclient, pika, rasa
    Running setup.py install for ujson ... error
    ERROR: Command errored out with exit status 1:
     command: /usr/local/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-ingvl92n/ujson/setup.py'"'"'; __file__='"'"'/tmp/pip-install-ingvl92n/ujson/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-46isinhm/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.7m/ujson
         cwd: /tmp/pip-install-ingvl92n/ujson/
    Complete output (12 lines):
    Warning: 'classifiers' should be a list, got type 'filter'
    running install
    running build
    running build_ext
    building 'ujson' extension
    creating build
    creating build/temp.linux-x86_64-3.7
    creating build/temp.linux-x86_64-3.7/python
    creating build/temp.linux-x86_64-3.7/lib
    gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I./python -I./lib -I/usr/local/include/python3.7m -c ./python/ujson.c -o build/temp.linux-x86_64-3.7/./python/ujson.o -D_GNU_SOURCE
    unable to execute 'gcc': No such file or directory
    error: command 'gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/local/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-ingvl92n/ujson/setup.py'"'"'; __file__='"'"'/tmp/pip-install-ingvl92n/ujson/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-46isinhm/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.7m/ujson Check the logs for full command output.
root@c02085da1f5f:/# 

なんかエラー出た。

unable to execute 'gcc': No such file or directory ってことなので、それっぽいのをインストールする

root@c02085da1f5f:/# apt-get update -q -y && apt-get install -q -y build-essential
...(なんかいろいろインストールされる)

update-alternatives: using /usr/bin/g++ to provide /usr/bin/c++ (c++) in auto mode
Setting up build-essential (12.3) ...
Processing triggers for libc-bin (2.24-11+deb9u4) ...
root@c02085da1f5f:/#

もっかいrasaのインストールにチャレンジ

root@c02085da1f5f:/# pip install rasa
...(なんかいろいろインストールされる)

Successfully built ujson
Installing collected packages: six, absl-py, cycler, python-dateutil, kiwisolver, numpy, pyparsing, matplotlib, docopt, PyYAML, pykwalify, protobuf, tensorflow-hub, kafka-python, pytz, multidict, pysocks, urllib3, certifi, idna, chardet, requests, PyJWT, twilio, fbmessenger, ruamel.yaml, yarl, attrs, async-timeout, aiohttp, slackclient, terminaltables, aiofiles, httptools, websockets, ujson, h11, hpack, hyperframe, h2, sniffio, rfc3986, hstspreload, httpx, uvloop, sanic, sanic-plugins-framework, sanic-cors, humanfriendly, coloredlogs, rasa-sdk, tzlocal, apscheduler, packaging, typeguard, tensorflow-addons, greenlet, gevent, pika, scipy, google-pasta, astor, termcolor, wrapt, grpcio, pyasn1, pyasn1-modules, rsa, cachetools, google-auth, werkzeug, markdown, oauthlib, requests-oauthlib, google-auth-oauthlib, tensorboard, h5py, keras-applications, tensorflow-estimator, keras-preprocessing, opt-einsum, gast, tensorflow, cloudpickle, decorator, tensorflow-probability, dnspython, pymongo, python-engineio, joblib, scikit-learn, httplib2, oauth2client, SQLAlchemy, requests-toolbelt, future, webexteamssdk, sanic-jwt, wcwidth, prompt-toolkit, colorhash, colorclass, tqdm, rocketchat-API, jmespath, docutils, botocore, s3transfer, boto3, questionary, zipp, importlib-metadata, pyrsistent, jsonschema, mattermostwrapper, jsonpickle, python-socketio, redis, pydot, networkx, pycparser, cffi, cryptography, python-telegram-bot, psycopg2-binary, async-generator, python-crfsuite, tabulate, sklearn-crfsuite, rasa
Successfully installed PyJWT-1.7.1 PyYAML-5.3.1 SQLAlchemy-1.3.15 absl-py-0.9.0 aiofiles-0.4.0 aiohttp-3.6.2 apscheduler-3.6.3 astor-0.8.1 async-generator-1.10 async-timeout-3.0.1 attrs-19.3.0 boto3-1.12.30 botocore-1.15.30 cachetools-4.0.0 certifi-2019.11.28 cffi-1.14.0 chardet-3.0.4 cloudpickle-1.2.2 colorclass-2.2.0 coloredlogs-10.0 colorhash-1.0.2 cryptography-2.8 cycler-0.10.0 decorator-4.4.2 dnspython-1.16.0 docopt-0.6.2 docutils-0.15.2 fbmessenger-6.0.0 future-0.18.2 gast-0.2.2 gevent-1.4.0 google-auth-1.12.0 google-auth-oauthlib-0.4.1 google-pasta-0.2.0 greenlet-0.4.15 grpcio-1.27.2 h11-0.8.1 h2-3.2.0 h5py-2.10.0 hpack-3.0.0 hstspreload-2020.3.25 httplib2-0.17.0 httptools-0.1.1 httpx-0.9.3 humanfriendly-8.1 hyperframe-5.2.0 idna-2.9 importlib-metadata-1.5.2 jmespath-0.9.5 joblib-0.14.1 jsonpickle-1.3 jsonschema-3.2.0 kafka-python-1.4.7 keras-applications-1.0.8 keras-preprocessing-1.1.0 kiwisolver-1.1.0 markdown-3.2.1 matplotlib-3.1.3 mattermostwrapper-2.2 multidict-4.7.5 networkx-2.4 numpy-1.18.2 oauth2client-4.1.3 oauthlib-3.1.0 opt-einsum-3.2.0 packaging-19.0 pika-1.1.0 prompt-toolkit-2.0.10 protobuf-3.11.3 psycopg2-binary-2.8.4 pyasn1-0.4.8 pyasn1-modules-0.2.8 pycparser-2.20 pydot-1.4.1 pykwalify-1.7.0 pymongo-3.8.0 pyparsing-2.4.6 pyrsistent-0.16.0 pysocks-1.7.1 python-crfsuite-0.9.7 python-dateutil-2.8.1 python-engineio-3.11.2 python-socketio-4.4.0 python-telegram-bot-11.1.0 pytz-2019.3 questionary-1.5.1 rasa-1.9.2 rasa-sdk-1.9.0 redis-3.4.1 requests-2.23.0 requests-oauthlib-1.3.0 requests-toolbelt-0.9.1 rfc3986-1.3.2 rocketchat-API-0.6.36 rsa-4.0 ruamel.yaml-0.15.100 s3transfer-0.3.3 sanic-19.12.2 sanic-cors-0.10.0.post3 sanic-jwt-1.3.2 sanic-plugins-framework-0.9.2 scikit-learn-0.22.2.post1 scipy-1.4.1 six-1.14.0 sklearn-crfsuite-0.3.6 slackclient-2.5.0 sniffio-1.1.0 tabulate-0.8.7 tensorboard-2.1.1 tensorflow-2.1.0 tensorflow-addons-0.8.3 tensorflow-estimator-2.1.0 tensorflow-hub-0.7.0 tensorflow-probability-0.7.0 termcolor-1.1.0 terminaltables-3.1.0 tqdm-4.31.1 twilio-6.26.3 typeguard-2.7.1 tzlocal-2.0.0 ujson-1.35 urllib3-1.25.8 uvloop-0.14.0 wcwidth-0.1.9 webexteamssdk-1.1.1 websockets-8.1 werkzeug-1.0.0 wrapt-1.12.1 yarl-1.4.2 zipp-3.1.0
root@c02085da1f5f:/#

インストールできた。

次はrasaのinit。

の前に、作業ディレクトリを移動する。/ 直下でなんかファイルができると気持ち悪いので。

root@c02085da1f5f:/# pwd
/
root@c02085da1f5f:/# ls -al
total 88
drwxr-xr-x   1 root root 4096 Mar 27 08:25 .
drwxr-xr-x   1 root root 4096 Mar 27 08:25 ..
-rwxr-xr-x   1 root root    0 Mar 27 08:25 .dockerenv
drwxr-xr-x   4 root root  128 Mar 27 08:22 app
drwxr-xr-x   1 root root 4096 Mar 27 08:33 bin
drwxr-xr-x   2 root root 4096 Feb  1 17:09 boot
drwxr-xr-x   5 root root  360 Mar 27 08:25 dev
drwxr-xr-x   1 root root 4096 Mar 27 08:33 etc
drwxr-xr-x   2 root root 4096 Feb  1 17:09 home
drwxr-xr-x   1 root root 4096 Mar 27 08:33 lib
drwxr-xr-x   2 root root 4096 Feb 24 00:00 lib64
drwxr-xr-x   2 root root 4096 Feb 24 00:00 media
drwxr-xr-x   2 root root 4096 Feb 24 00:00 mnt
drwxr-xr-x   2 root root 4096 Feb 24 00:00 opt
dr-xr-xr-x 228 root root    0 Mar 27 08:25 proc
drwx------   1 root root 4096 Mar 27 08:27 root
drwxr-xr-x   3 root root 4096 Feb 24 00:00 run
drwxr-xr-x   2 root root 4096 Feb 24 00:00 sbin
drwxr-xr-x   2 root root 4096 Feb 24 00:00 srv
dr-xr-xr-x  13 root root    0 Mar 27 08:25 sys
drwxrwxrwt   1 root root 4096 Mar 27 08:36 tmp
drwxr-xr-x   1 root root 4096 Feb 24 00:00 usr
drwxr-xr-x   1 root root 4096 Feb 24 00:00 var
root@c02085da1f5f:/# cd app/
root@c02085da1f5f:/app# ls -al
total 12
drwxr-xr-x 4 root root  128 Mar 27 08:22 .
drwxr-xr-x 1 root root 4096 Mar 27 08:25 ..
-rw-r--r-- 1 root root   29 Mar 27 08:20 Dockerfile
-rw-r--r-- 1 root root  112 Mar 27 08:22 docker-compose.yml
root@c02085da1f5f:/app#

/app で作業する。

rasa init する

root@c02085da1f5f:/app# rasa init --no-prompt
Welcome to Rasa! 🤖

To get started quickly, an initial project will be created.
If you need some help, check out the documentation at https://rasa.com/docs/rasa.

Created project directory at '/app'.
Finished creating project structure.
Training an initial model...
Training Core model...

(なんやかんや)

2020-03-27 08:46:25 INFO     rasa.nlu.model  - Successfully saved model into '/tmp/tmp4p2k58s6/nlu'
NLU model training completed.
Your Rasa model is trained and saved at '/app/models/20200327-084527.tar.gz'.
If you want to speak to the assistant, run 'rasa shell' at any time inside the project directory.
root@c02085da1f5f:/app#

init できたっぽい。

ディレクトリ内を見てみる。

root@c02085da1f5f:/app# ls -al
total 32
drwxr-xr-x 14 root root  448 Mar 27 08:46 .
drwxr-xr-x  1 root root 4096 Mar 27 08:25 ..
-rw-r--r--  1 root root   29 Mar 27 08:20 Dockerfile
-rw-r--r--  1 root root    0 Mar 27 08:36 __init__.py
drwxr-xr-x  4 root root  128 Mar 27 08:45 __pycache__
-rw-r--r--  1 root root  757 Mar 27 08:36 actions.py
-rw-r--r--  1 root root  622 Mar 27 08:36 config.yml
-rw-r--r--  1 root root  938 Mar 27 08:36 credentials.yml
drwxr-xr-x  4 root root  128 Mar 27 08:45 data
-rw-r--r--  1 root root  112 Mar 27 08:22 docker-compose.yml
-rw-r--r--  1 root root  549 Mar 27 08:36 domain.yml
-rw-r--r--  1 root root 1456 Mar 27 08:36 endpoints.yml
drwxr-xr-x  3 root root   96 Mar 27 08:46 models
drwxr-xr-x  3 root root   96 Mar 27 08:45 tests
root@c02085da1f5f:/app#

なんか色々増えてる。

参考にした記事と同様に、まずは会話してみる。

root@c02085da1f5f:/app# rasa shell
2020-03-27 08:54:07 INFO     root  - Connecting to channel 'cmdline' which was specified by the '--connector' argument. Any other channels will be ignored. To connect to all given channels, omit the '--connector' argument.
2020-03-27 08:54:07 INFO     root  - Starting Rasa server on http://localhost:5005
2020-03-27 08:54:08.593814: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
Bot loaded. Type a message and press enter (use '/stop' to exit):
Your input ->  hello
Hey! How are you?
Your input ->  /stop
2020-03-27 08:54:31 INFO     root  - Killing Sanic server now.
root@c02085da1f5f:/app#

おお。できてる。

日本語に対応する

次にこの記事に戻って日本語対応してみる。

まずは data/nlu.md の一番下にデータを追加してみる。

$ vi data/nlu.md 
data/nlu.md
## intent:restaurant_ja
- 渋谷で美味しいイタリアンない?
- 和食食べたいんだけど、六本木におすすめある?
- 今度麻布行くんだけど、フレンチのお店教えて

上記記事だとmecabをやめてspaCyを使っているようなので、mecabを飛ばしてspaCyでやってみる。

config.yml にデフォルト設定があるようなので、一旦全部消して以下を書いてみる。

config.yml
pipeline:
  # - name: “tokenizer_whitespace”
  - name: “nlp_spacy”
  - name: “tokenizer_spacy”
  - name: “CRFEntityExtractor”
  - name: “ner_crf”
  - name: “ner_synonyms”
  - name: “intent_featurizer_count_vectors”
  - name: “intent_classifier_tensorflow_embedding”

学習させてみる。

root@c02085da1f5f:/app# rasa train nlu
The config file 'config.yml' is missing mandatory parameters: 'language'. Add missing parameters to config file and try again.
root@c02085da1f5f:/app#

config.yml に languageが無いよって。

上記記事だとja_ginzaにしているようなので、端折って config.yml の一番上に ja_ginza を追加してみる。

config.yml
language: ja_ginza

もういっかいコマンドを叩いてみる。

root@c02085da1f5f:/app# rasa train nlu
Training NLU model...
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/rasa/nlu/registry.py", line 173, in get_component_class
    return class_from_module_path(component_name)
  File "/usr/local/lib/python3.7/site-packages/rasa/utils/common.py", line 211, in class_from_module_path
    raise ImportError(f"Cannot retrieve class from path {module_path}.")
ImportError: Cannot retrieve class from path “nlp_spacy”.

(なんやかんや)

nlp_spacyが云々で怒られている。

spaCy入れてないもんね。

spaCy入れてみる。

root@c02085da1f5f:/app# pip install spacy

(なんやかんや)

Successfully installed blis-0.4.1 catalogue-1.0.0 cymem-2.0.3 murmurhash-1.0.2 plac-1.1.3 preshed-3.0.2 spacy-2.2.4 srsly-1.0.2 thinc-7.4.0 tqdm-4.43.0 wasabi-0.6.0
root@c02085da1f5f:/app#

入った。

GiNZAの日本語処理ライブラリ?があるようなので、インストールする。

root@c02085da1f5f:/app# pip install "https://github.com/megagonlabs/ginza/releases/download/latest/ginza-latest.tar.gz"
Collecting https://github.com/megagonlabs/ginza/releases/download/latest/ginza-latest.tar.gz

(なんやかんや)

Successfully built ginza ja-ginza SudachiDict-core
Installing collected packages: ja-ginza, sortedcontainers, Cython, dartsclone, SudachiPy, SudachiDict-core, ginza
Successfully installed Cython-0.29.16 SudachiDict-core-20190927 SudachiPy-0.4.3 dartsclone-0.9.0 ginza-2.2.1 ja-ginza-2.2.0 sortedcontainers-2.1.0
root@c02085da1f5f:/app#

インストールできた。

これで準備できたか?

もう一回コマンドを叩く。

root@c02085da1f5f:/app# rasa train nlu
Training NLU model...
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/rasa/nlu/registry.py", line 173, in get_component_class
    return class_from_module_path(component_name)
  File "/usr/local/lib/python3.7/site-packages/rasa/utils/common.py", line 211, in class_from_module_path
    raise ImportError(f"Cannot retrieve class from path {module_path}.")
ImportError: Cannot retrieve class from path “nlp_spacy”.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/rasa", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/rasa/__main__.py", line 91, in main
    cmdline_arguments.func(cmdline_arguments)
  File "/usr/local/lib/python3.7/site-packages/rasa/cli/train.py", line 140, in train_nlu
    persist_nlu_training_data=args.persist_nlu_data,
  File "/usr/local/lib/python3.7/site-packages/rasa/train.py", line 414, in train_nlu
    persist_nlu_training_data,
  File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
  File "/usr/local/lib/python3.7/site-packages/rasa/train.py", line 445, in _train_nlu_async
    persist_nlu_training_data=persist_nlu_training_data,
  File "/usr/local/lib/python3.7/site-packages/rasa/train.py", line 474, in _train_nlu_with_validated_data
    persist_nlu_training_data=persist_nlu_training_data,
  File "/usr/local/lib/python3.7/site-packages/rasa/nlu/train.py", line 74, in train
    trainer = Trainer(nlu_config, component_builder)
  File "/usr/local/lib/python3.7/site-packages/rasa/nlu/model.py", line 142, in __init__
    components.validate_requirements(cfg.component_names)
  File "/usr/local/lib/python3.7/site-packages/rasa/nlu/components.py", line 51, in validate_requirements
    component_class = registry.get_component_class(component_name)
  File "/usr/local/lib/python3.7/site-packages/rasa/nlu/registry.py", line 199, in get_component_class
    raise ModuleNotFoundError(exception_message)
ModuleNotFoundError: Cannot find class '“nlp_spacy”' from global namespace. Please check that there is no typo in the class name and that you have imported the class into the global namespace.
root@c02085da1f5f:/app#

なんか怒られた。

Cannot find class '“nlp_spacy”' ってなんだろ。

ModuleNotFoundError: Cannot find class '“nlp_spacy”' from global namespace. Please check that there is no typo in the class name and that you have imported the class into the global namespace. でググってみる。

このページが一番上に出てきた。

old_style_names

"nlp_spacy": "SpacyNLP",

こんな感じの対応付けが定義されている。

さっきconfig.ymlに書いたのが、nlp_spacy だから SpacyNLP に変更してみる。

config.yml
language: ja_ginza

pipeline:
  # - name: “tokenizer_whitespace”
  - name: "SpacyNLP" # ←ここを変更
  - name: "tokenizer_spacy"
  - name: "CRFEntityExtractor"
  - name: "ner_crf"
  - name: "ner_synonyms"
  - name: "intent_featurizer_count_vectors"
  - name: "intent_classifier_tensorflow_embedding"

再度コマンドを叩く。

root@c02085da1f5f:/app# rasa train nlu
Training NLU model...

(なんやかんや)

ModuleNotFoundError: Cannot find class '“SpacyNLP”' from global namespace. Please check that there is no typo in the class name and that you have imported the class into the global namespace.

“SpacyNLP” ??

あ、コピペしたらダブルクオートがおかしいのか。

ダブルクオートを治す。

config.yml
language: ja_ginza

pipeline:
  # - name: “tokenizer_whitespace”
  - name: "nlp_spacy"
  - name: "tokenizer_spacy"
  - name: "CRFEntityExtractor"
  - name: "ner_crf"
  - name: "ner_synonyms"
  - name: "intent_featurizer_count_vectors"
  - name: "intent_classifier_tensorflow_embedding"

もういっかいコマンド叩く。

root@c02085da1f5f:/app# rasa train nlu
Training NLU model...

(なんやかんや)

oblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
  warnings.warn(msg, category=FutureWarning)
2020-03-27 09:15:20 INFO     rasa.nlu.model  - Successfully saved model into '/tmp/tmpn4xtb3h7/nlu'
NLU model training completed.
Your Rasa model is trained and saved at '/app/models/nlu-20200327-091520.tar.gz'.
root@c02085da1f5f:/app#

お、やっと動いた。

rasa shell nlu で intent が取れれば成功?

root@c02085da1f5f:/app# rasa shell nlu
2020-03-27 09:16:26 INFO     rasa.nlu.components  - Added 'SpacyNLP' to component cache. Key 'SpacyNLP-ja_ginza'.
/usr/local/lib/python3.7/site-packages/sklearn/externals/joblib/__init__.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
  warnings.warn(msg, category=FutureWarning)
2020-03-27 09:16:26.359929: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
/usr/local/lib/python3.7/site-packages/rasa/nlu/classifiers/diet_classifier.py:864: FutureWarning: 'EmbeddingIntentClassifier' is deprecated and will be removed in version 2.0. Use 'DIETClassifier' instead.
  model=model,
NLU model loaded. Type a message and press enter to parse it.
Next message:
六本木のイタリアン教えて
{
  "intent": {
    "name": "restaurant_ja",
    "confidence": 0.48260971903800964
  },
  "entities": [],
  "intent_ranking": [
    {
      "name": "restaurant_ja",
      "confidence": 0.48260971903800964
    },
    {
      "name": "affirm",
      "confidence": 0.1126396581530571
    },
    {
      "name": "greet",
      "confidence": 0.09804855287075043
    },
    {
      "name": "goodbye",
      "confidence": 0.08541827648878098
    },
    {
      "name": "mood_great",
      "confidence": 0.07586738467216492
    },
    {
      "name": "deny",
      "confidence": 0.06578702479600906
    },
    {
      "name": "bot_challenge",
      "confidence": 0.05436018109321594
    },
    {
      "name": "mood_unhappy",
      "confidence": 0.025269268080592155
    }
  ],
  "text": "六本木のイタリアン教えて"
}
Next message:

お、それっぽい。intentのスコア低いけど、こんなもん?

固有表現もやってみよう。

data/nlu.md の日本語の箇所を下記のように書き換えて、

data/nlu.md
## intent:restaurant_ja
- [渋谷](location)で美味しい[イタリアン](restaurant_type)ない?
- [和食](restaurant_type)食べたいんだけど、[六本木](location)におすすめある?
- 今度[麻布](location)行くんだけど、[フレンチ](restaurant_type)のお店教えて

学習させて、

root@c02085da1f5f:/app# rasa train nlu
Training NLU model...

(なんやかんや)

2020-03-27 13:46:37 INFO     rasa.nlu.model  - Finished training component.
/usr/local/lib/python3.7/site-packages/sklearn/externals/joblib/__init__.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
  warnings.warn(msg, category=FutureWarning)
2020-03-27 13:46:37 INFO     rasa.nlu.model  - Successfully saved model into '/tmp/tmpy7tqyrsb/nlu'
NLU model training completed.
Your Rasa model is trained and saved at '/app/models/nlu-20200327-134637.tar.gz'.
root@c02085da1f5f:/app#

shell を叩いてみる。

root@c02085da1f5f:/app# rasa shell nlu
2020-03-27 13:47:07 INFO     rasa.nlu.components  - Added 'SpacyNLP' to component cache. Key 'SpacyNLP-ja_ginza'.
/usr/local/lib/python3.7/site-packages/sklearn/externals/joblib/__init__.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
  warnings.warn(msg, category=FutureWarning)
2020-03-27 13:47:07.924760: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
/usr/local/lib/python3.7/site-packages/rasa/nlu/classifiers/diet_classifier.py:864: FutureWarning: 'EmbeddingIntentClassifier' is deprecated and will be removed in version 2.0. Use 'DIETClassifier' instead.
  model=model,
NLU model loaded. Type a message and press enter to parse it.
Next message:
渋谷で美味しいイタリアンない?
{
  "intent": {
    "name": "restaurant_ja",
    "confidence": 0.9823814034461975
  },
  "entities": [
    {
      "start": 0,
      "end": 2,
      "value": "渋谷",
      "entity": "location",
      "confidence": 0.7429793877231764,
      "extractor": "CRFEntityExtractor"
    },
    {
      "start": 7,
      "end": 12,
      "value": "イタリアン",
      "entity": "restaurant_type",
      "confidence": 0.7588623133724827,
      "extractor": "CRFEntityExtractor"
    },
    {
      "start": 0,
      "end": 2,
      "value": "渋谷",
      "entity": "location",
      "confidence": 0.7429793877231764,
      "extractor": "CRFEntityExtractor"
    },
    {
      "start": 7,
      "end": 12,
      "value": "イタリアン",
      "entity": "restaurant_type",
      "confidence": 0.7588623133724827,
      "extractor": "CRFEntityExtractor"
    }
  ],
  "intent_ranking": [
    {
      "name": "restaurant_ja",
      "confidence": 0.9823814034461975
    },
    {
      "name": "mood_great",
      "confidence": 0.008284788578748703
    },
    {
      "name": "deny",
      "confidence": 0.006644146051257849
    },
    {
      "name": "greet",
      "confidence": 0.001571052591316402
    },
    {
      "name": "bot_challenge",
      "confidence": 0.0006475948612205684
    },
    {
      "name": "affirm",
      "confidence": 0.00034828390926122665
    },
    {
      "name": "mood_unhappy",
      "confidence": 8.46567636472173e-05
    },
    {
      "name": "goodbye",
      "confidence": 3.805106462095864e-05
    }
  ],
  "text": "渋谷で美味しいイタリアンない?"
}
Next message:

おお。記事みたいな出力になった。

記事とは違って日本語は文字化けしてないけど。

環境ができて動くところまで見れたので、今回はここまで。
今後は教師データを増やす方法を考える。

追記

ここまでの成果として、pip で install したライブラリを requirements.txt にして、build したら勝手にインストールされるように Dockerfile に処理を追記しておいた。

requirements.txt
rasa==1.9.2
spacy==2.2.4
FROM python:3.7-slim-stretch

RUN apt-get update -q -y && \
    apt-get install -q -y \
    build-essential

ADD ./requirements.txt requirements.txt

RUN pip install -r requirements.txt
RUN pip install "https://github.com/megagonlabs/ginza/releases/download/latest/ginza-latest.tar.gz"

ADD . /app
WORKDIR /app

んで、build して

$ docker-compose build

動かす

$ docker-compose run --rm rasa bash
root@56ff2c2f2f38:/app#

追記2

github で公開した

追記3

ginzaのインストールはpip経由で(3.0から)できるようになったようなので、requirements.txtにginzaを追記してDockerfileから該当箇所を削除した。

4
5
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4
5