専門用語(キーワード)自動抽出Pythonモジュールtermextract
http://gensen.dl.itc.u-tokyo.ac.jp/pytermextract/
bash
$ docker login
$ docker run -v /tmp/autosar:/tmp/autosar -p 8888:8888 -it continuumio/anaconda3 /bin/bash
Unable to find image 'continuumio/anaconda3:latest' locally
latest: Pulling from continuumio/anaconda3
852e50cd189d: Pull complete
864e1e8957d3: Pull complete
6d4823199f64: Pull complete
Digest: sha256:0b2047cdc438807b87d53272c3d5b1
ubuntu
(base) root@75aa2a1ec996:/# apt update; apt -y upgrade
(base) root@75aa2a1ec996:/# apt install wget vim unzip
(base) root@75aa2a1ec996:/# wget http://gensen.dl.itc.u-tokyo.ac.jp/soft/pytermextract-0_01.zip
(base) root@75aa2a1ec996:/# unzip pytermextract-0_01.zip
Archive: pytermextract-0_01.zip
creating: pytermextract-0_01/documents/
inflating: pytermextract-0_01/documents/concept.png
creating: pytermextract-0_01/documents/css/
inflating: pytermextract-0_01/documents/css/pytermextract.css
creating: pytermextract-0_01/documents/functions/
inflating: pytermextract-0_01/documents/functions/chinese_plaintext.html
inflating: pytermextract-0_01/documents/functions/english_plaintext.html
inflating: pytermextract-0_01/documents/functions/english_postagger.html
inflating: pytermextract-0_01/documents/functions/janome.html
inflating: pytermextract-0_01/documents/functions/janome_pp.html
inflating: pytermextract-0_01/documents/functions/janome_store-lr.html
inflating: pytermextract-0_01/documents/functions/janome_tfidf.html
inflating: pytermextract-0_01/documents/functions/japanese_plaintext.html
inflating: pytermextract-0_01/documents/functions/mecab.html
inflating: pytermextract-0_01/documents/functions/nlpir.html
inflating: pytermextract-0_01/documents/index.html
creating: pytermextract-0_01/pytermex/
inflating: pytermextract-0_01/pytermex/clear_df.py
inflating: pytermextract-0_01/pytermex/clear_store_lr.py
inflating: pytermextract-0_01/pytermex/termex_chi_plain.py
inflating: pytermextract-0_01/pytermex/termex_eng.py
inflating: pytermextract-0_01/pytermex/termex_eng_plain.py
inflating: pytermextract-0_01/pytermex/termex_janome.py
inflating: pytermextract-0_01/pytermex/termex_janome_pp.py
inflating: pytermextract-0_01/pytermex/termex_janome_store_lr1.py
inflating: pytermextract-0_01/pytermex/termex_janome_store_lr2.py
inflating: pytermextract-0_01/pytermex/termex_janome_tfidf1.py
inflating: pytermextract-0_01/pytermex/termex_janome_tfidf2.py
inflating: pytermextract-0_01/pytermex/termex_jpn_plain.py
inflating: pytermextract-0_01/pytermex/termex_mecab.py
inflating: pytermextract-0_01/pytermex/termex_nlpir.py
extracting: pytermextract-0_01/readme.txt
inflating: pytermextract-0_01/setup.py
creating: pytermextract-0_01/termextract/
extracting: pytermextract-0_01/termextract/__init__.py
inflating: pytermextract-0_01/termextract/chinese_plaintext.py
inflating: pytermextract-0_01/termextract/core.py
inflating: pytermextract-0_01/termextract/english_plaintext.py
inflating: pytermextract-0_01/termextract/english_postagger.py
inflating: pytermextract-0_01/termextract/janome.py
inflating: pytermextract-0_01/termextract/japanese_plaintext.py
inflating: pytermextract-0_01/termextract/mecab.py
inflating: pytermextract-0_01/termextract/nlpir.py
creating: pytermextract-0_01/test_data/
inflating: pytermextract-0_01/test_data/chi_sample.txt
inflating: pytermextract-0_01/test_data/chi_sample_s.txt
inflating: pytermextract-0_01/test_data/eng_sample.txt
inflating: pytermextract-0_01/test_data/jpn_sample.txt
inflating: pytermextract-0_01/test_data/jpn_sample2.txt
inflating: pytermextract-0_01/test_data/jpn_sample3.txt
inflating: pytermextract-0_01/test_data/mecab_check.txt
inflating: pytermextract-0_01/test_data/mecab_out_sample.txt
creating: pytermextract-0_01/tests/
extracting: pytermextract-0_01/tests/__init__.py
inflating: pytermextract-0_01/tests/store-lr.bak
inflating: pytermextract-0_01/tests/store-lr.dat
inflating: pytermextract-0_01/tests/store-lr.dir
inflating: pytermextract-0_01/tests/termextrat.bak
inflating: pytermextract-0_01/tests/termextrat.dat
inflating: pytermextract-0_01/tests/termextrat.dir
inflating: pytermextract-0_01/tests/test_chi_plain.py
inflating: pytermextract-0_01/tests/test_eng.py
inflating: pytermextract-0_01/tests/test_eng_plain.py
inflating: pytermextract-0_01/tests/test_janome.py
inflating: pytermextract-0_01/tests/test_jpn_plain.py
inflating: pytermextract-0_01/tests/test_mecab.py
inflating: pytermextract-0_01/tests/test_mecab_pp.py
inflating: pytermextract-0_01/tests/test_mecab_store_lr.py
inflating: pytermextract-0_01/tests/test_nlpir.py
(base) root@75aa2a1ec996:/# ls
bin dev home lib64 mnt p.zip pytermextract-0_01 run srv tmp var
boot etc lib media opt proc root sbin sys usr
(base) root@75aa2a1ec996:/# cd pytermextract-0_01/
(base) root@75aa2a1ec996:/pytermextract-0_01# ls
documents pytermex readme.txt setup.py termextract test_data tests
(base) root@75aa2a1ec996:/pytermextract-0_01# python setup.py install
running install
running build
running build_py
creating build
creating build/lib
creating build/lib/termextract
copying termextract/core.py -> build/lib/termextract
copying termextract/nlpir.py -> build/lib/termextract
copying termextract/__init__.py -> build/lib/termextract
copying termextract/english_plaintext.py -> build/lib/termextract
copying termextract/chinese_plaintext.py -> build/lib/termextract
copying termextract/japanese_plaintext.py -> build/lib/termextract
copying termextract/janome.py -> build/lib/termextract
copying termextract/english_postagger.py -> build/lib/termextract
copying termextract/mecab.py -> build/lib/termextract
running install_lib
creating /opt/conda/lib/python3.8/site-packages/termextract
copying build/lib/termextract/core.py -> /opt/conda/lib/python3.8/site-packages/termextract
copying build/lib/termextract/nlpir.py -> /opt/conda/lib/python3.8/site-packages/termextract
copying build/lib/termextract/__init__.py -> /opt/conda/lib/python3.8/site-packages/termextract
copying build/lib/termextract/english_plaintext.py -> /opt/conda/lib/python3.8/site-packages/termextract
copying build/lib/termextract/chinese_plaintext.py -> /opt/conda/lib/python3.8/site-packages/termextract
copying build/lib/termextract/japanese_plaintext.py -> /opt/conda/lib/python3.8/site-packages/termextract
copying build/lib/termextract/janome.py -> /opt/conda/lib/python3.8/site-packages/termextract
copying build/lib/termextract/english_postagger.py -> /opt/conda/lib/python3.8/site-packages/termextract
copying build/lib/termextract/mecab.py -> /opt/conda/lib/python3.8/site-packages/termextract
byte-compiling /opt/conda/lib/python3.8/site-packages/termextract/core.py to core.cpython-38.pyc
byte-compiling /opt/conda/lib/python3.8/site-packages/termextract/nlpir.py to nlpir.cpython-38.pyc
byte-compiling /opt/conda/lib/python3.8/site-packages/termextract/__init__.py to __init__.cpython-38.pyc
byte-compiling /opt/conda/lib/python3.8/site-packages/termextract/english_plaintext.py to english_plaintext.cpython-38.pyc
byte-compiling /opt/conda/lib/python3.8/site-packages/termextract/chinese_plaintext.py to chinese_plaintext.cpython-38.pyc
byte-compiling /opt/conda/lib/python3.8/site-packages/termextract/japanese_plaintext.py to japanese_plaintext.cpython-38.pyc
byte-compiling /opt/conda/lib/python3.8/site-packages/termextract/janome.py to janome.cpython-38.pyc
byte-compiling /opt/conda/lib/python3.8/site-packages/termextract/english_postagger.py to english_postagger.cpython-38.pyc
byte-compiling /opt/conda/lib/python3.8/site-packages/termextract/mecab.py to mecab.cpython-38.pyc
running install_egg_info
Writing /opt/conda/lib/python3.8/site-packages/termextract-0.12b-py3.8.egg-info
(base) root@75aa2a1ec996:/pytermextract-0_01# cd tests
(base) root@75aa2a1ec996:/pytermextract-0_01/tests# python -m unittest
.E.E....E
======================================================================
ERROR: test (test_eng.english_postagger)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/pytermextract-0_01/tests/test_eng.py", line 15, in test
tagged_text = nltk.pos_tag(nltk.word_tokenize(text))
File "/opt/conda/lib/python3.8/site-packages/nltk/tokenize/__init__.py", line 129, in word_tokenize
sentences = [text] if preserve_line else sent_tokenize(text, language)
File "/opt/conda/lib/python3.8/site-packages/nltk/tokenize/__init__.py", line 106, in sent_tokenize
tokenizer = load("tokenizers/punkt/{0}.pickle".format(language))
File "/opt/conda/lib/python3.8/site-packages/nltk/data.py", line 752, in load
opened_resource = _open(resource_url)
File "/opt/conda/lib/python3.8/site-packages/nltk/data.py", line 877, in _open
return find(path_, path + [""]).open()
File "/opt/conda/lib/python3.8/site-packages/nltk/data.py", line 585, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('punkt')
For more information see: https://www.nltk.org/data.html
Attempted to load tokenizers/punkt/PY3/english.pickle
Searched in:
- '/root/nltk_data'
- '/opt/conda/nltk_data'
- '/opt/conda/share/nltk_data'
- '/opt/conda/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- ''
**********************************************************************
======================================================================
ERROR: test (test_janome.janome)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/pytermextract-0_01/tests/test_janome.py", line 9, in test
from janome.tokenizer import Tokenizer
ModuleNotFoundError: No module named 'janome'
======================================================================
ERROR: test (test_nlpir.nlpir)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/pytermextract-0_01/tests/test_nlpir.py", line 9, in test
import pynlpir
ModuleNotFoundError: No module named 'pynlpir'
----------------------------------------------------------------------
Ran 9 tests in 6.276s
FAILED (errors=3)
じゃ
ubuntu
# python
Python 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download('punkt')
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Unzipping tokenizers/punkt.zip.
True
>>> nltk.download('averaged_perceptron_tagger')
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data] /root/nltk_data...
[nltk_data] Unzipping taggers/averaged_perceptron_tagger.zip.
True
>>> exit
# python -m unittest
...E....E
======================================================================
ERROR: test (test_janome.janome)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/pytermextract-0_01/tests/test_janome.py", line 9, in test
from janome.tokenizer import Tokenizer
ModuleNotFoundError: No module named 'janome'
======================================================================
ERROR: test (test_nlpir.nlpir)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/pytermextract-0_01/tests/test_nlpir.py", line 9, in test
import pynlpir
ModuleNotFoundError: No module named 'pynlpir'
----------------------------------------------------------------------
Ran 9 tests in 6.164s
FAILED (errors=2)
pip install janome
Collecting janome
Downloading Janome-0.4.1-py2.py3-none-any.whl (19.7 MB)
|████████████████████████████████| 19.7 MB 5.4 MB/s
Installing collected packages: janome
Successfully installed janome-0.4.1
# python -m unittest
.........
----------------------------------------------------------------------
Ran 9 tests in 7.016s
OK
docker push失敗。
bash
$ docker commit 75aa2a1ec996 kaizenjapan/pytermextract
Error response from daemon: open /var/lib/docker/image/overlay2/layerdb/tmp/write-set-689207880/diff: structure needs cleaning
python - Python:複合語を既知の単語に分割(辞書から)
https://cloud6.net/so/python/2100473
最後までおよみいただきありがとうございました。
いいね 💚、フォローをお願いします。
Thank you very much for reading to the last sentence.
Please press the like icon 💚 and follow me for your happy life.