1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

AUTOSAR CountdownAdvent Calendar 2022

Day 20

pytermextract 導入失敗したかと思ったら全部入れろってことだった。けどpush失敗。

Last updated at Posted at 2021-04-20

専門用語(キーワード)自動抽出Pythonモジュールtermextract
http://gensen.dl.itc.u-tokyo.ac.jp/pytermextract/

bash
$ docker login
$ docker run -v /tmp/autosar:/tmp/autosar -p 8888:8888 -it continuumio/anaconda3 /bin/bash
Unable to find image 'continuumio/anaconda3:latest' locally
latest: Pulling from continuumio/anaconda3
852e50cd189d: Pull complete 
864e1e8957d3: Pull complete 
6d4823199f64: Pull complete 
Digest: sha256:0b2047cdc438807b87d53272c3d5b1

ubuntu
(base) root@75aa2a1ec996:/# apt update; apt -y upgrade
(base) root@75aa2a1ec996:/# apt install wget vim unzip
(base) root@75aa2a1ec996:/# wget http://gensen.dl.itc.u-tokyo.ac.jp/soft/pytermextract-0_01.zip
(base) root@75aa2a1ec996:/# unzip pytermextract-0_01.zip
Archive: pytermextract-0_01.zip
   creating: pytermextract-0_01/documents/
  inflating: pytermextract-0_01/documents/concept.png  
   creating: pytermextract-0_01/documents/css/
  inflating: pytermextract-0_01/documents/css/pytermextract.css  
   creating: pytermextract-0_01/documents/functions/
  inflating: pytermextract-0_01/documents/functions/chinese_plaintext.html  
  inflating: pytermextract-0_01/documents/functions/english_plaintext.html  
  inflating: pytermextract-0_01/documents/functions/english_postagger.html  
  inflating: pytermextract-0_01/documents/functions/janome.html  
  inflating: pytermextract-0_01/documents/functions/janome_pp.html  
  inflating: pytermextract-0_01/documents/functions/janome_store-lr.html  
  inflating: pytermextract-0_01/documents/functions/janome_tfidf.html  
  inflating: pytermextract-0_01/documents/functions/japanese_plaintext.html  
  inflating: pytermextract-0_01/documents/functions/mecab.html  
  inflating: pytermextract-0_01/documents/functions/nlpir.html  
  inflating: pytermextract-0_01/documents/index.html  
   creating: pytermextract-0_01/pytermex/
  inflating: pytermextract-0_01/pytermex/clear_df.py  
  inflating: pytermextract-0_01/pytermex/clear_store_lr.py  
  inflating: pytermextract-0_01/pytermex/termex_chi_plain.py  
  inflating: pytermextract-0_01/pytermex/termex_eng.py  
  inflating: pytermextract-0_01/pytermex/termex_eng_plain.py  
  inflating: pytermextract-0_01/pytermex/termex_janome.py  
  inflating: pytermextract-0_01/pytermex/termex_janome_pp.py  
  inflating: pytermextract-0_01/pytermex/termex_janome_store_lr1.py  
  inflating: pytermextract-0_01/pytermex/termex_janome_store_lr2.py  
  inflating: pytermextract-0_01/pytermex/termex_janome_tfidf1.py  
  inflating: pytermextract-0_01/pytermex/termex_janome_tfidf2.py  
  inflating: pytermextract-0_01/pytermex/termex_jpn_plain.py  
  inflating: pytermextract-0_01/pytermex/termex_mecab.py  
  inflating: pytermextract-0_01/pytermex/termex_nlpir.py  
 extracting: pytermextract-0_01/readme.txt  
  inflating: pytermextract-0_01/setup.py  
   creating: pytermextract-0_01/termextract/
 extracting: pytermextract-0_01/termextract/__init__.py  
  inflating: pytermextract-0_01/termextract/chinese_plaintext.py  
  inflating: pytermextract-0_01/termextract/core.py  
  inflating: pytermextract-0_01/termextract/english_plaintext.py  
  inflating: pytermextract-0_01/termextract/english_postagger.py  
  inflating: pytermextract-0_01/termextract/janome.py  
  inflating: pytermextract-0_01/termextract/japanese_plaintext.py  
  inflating: pytermextract-0_01/termextract/mecab.py  
  inflating: pytermextract-0_01/termextract/nlpir.py  
   creating: pytermextract-0_01/test_data/
  inflating: pytermextract-0_01/test_data/chi_sample.txt  
  inflating: pytermextract-0_01/test_data/chi_sample_s.txt  
  inflating: pytermextract-0_01/test_data/eng_sample.txt  
  inflating: pytermextract-0_01/test_data/jpn_sample.txt  
  inflating: pytermextract-0_01/test_data/jpn_sample2.txt  
  inflating: pytermextract-0_01/test_data/jpn_sample3.txt  
  inflating: pytermextract-0_01/test_data/mecab_check.txt  
  inflating: pytermextract-0_01/test_data/mecab_out_sample.txt  
   creating: pytermextract-0_01/tests/
 extracting: pytermextract-0_01/tests/__init__.py  
  inflating: pytermextract-0_01/tests/store-lr.bak  
  inflating: pytermextract-0_01/tests/store-lr.dat  
  inflating: pytermextract-0_01/tests/store-lr.dir  
  inflating: pytermextract-0_01/tests/termextrat.bak  
  inflating: pytermextract-0_01/tests/termextrat.dat  
  inflating: pytermextract-0_01/tests/termextrat.dir  
  inflating: pytermextract-0_01/tests/test_chi_plain.py  
  inflating: pytermextract-0_01/tests/test_eng.py  
  inflating: pytermextract-0_01/tests/test_eng_plain.py  
  inflating: pytermextract-0_01/tests/test_janome.py  
  inflating: pytermextract-0_01/tests/test_jpn_plain.py  
  inflating: pytermextract-0_01/tests/test_mecab.py  
  inflating: pytermextract-0_01/tests/test_mecab_pp.py  
  inflating: pytermextract-0_01/tests/test_mecab_store_lr.py  
  inflating: pytermextract-0_01/tests/test_nlpir.py  
(base) root@75aa2a1ec996:/# ls
bin   dev  home  lib64	mnt  p.zip  pytermextract-0_01	run   srv  tmp	var
boot  etc  lib	 media	opt  proc   root		sbin  sys  usr
(base) root@75aa2a1ec996:/# cd pytermextract-0_01/
(base) root@75aa2a1ec996:/pytermextract-0_01# ls
documents  pytermex  readme.txt  setup.py  termextract	test_data  tests
(base) root@75aa2a1ec996:/pytermextract-0_01# python setup.py install
running install
running build
running build_py
creating build
creating build/lib
creating build/lib/termextract
copying termextract/core.py -> build/lib/termextract
copying termextract/nlpir.py -> build/lib/termextract
copying termextract/__init__.py -> build/lib/termextract
copying termextract/english_plaintext.py -> build/lib/termextract
copying termextract/chinese_plaintext.py -> build/lib/termextract
copying termextract/japanese_plaintext.py -> build/lib/termextract
copying termextract/janome.py -> build/lib/termextract
copying termextract/english_postagger.py -> build/lib/termextract
copying termextract/mecab.py -> build/lib/termextract
running install_lib
creating /opt/conda/lib/python3.8/site-packages/termextract
copying build/lib/termextract/core.py -> /opt/conda/lib/python3.8/site-packages/termextract
copying build/lib/termextract/nlpir.py -> /opt/conda/lib/python3.8/site-packages/termextract
copying build/lib/termextract/__init__.py -> /opt/conda/lib/python3.8/site-packages/termextract
copying build/lib/termextract/english_plaintext.py -> /opt/conda/lib/python3.8/site-packages/termextract
copying build/lib/termextract/chinese_plaintext.py -> /opt/conda/lib/python3.8/site-packages/termextract
copying build/lib/termextract/japanese_plaintext.py -> /opt/conda/lib/python3.8/site-packages/termextract
copying build/lib/termextract/janome.py -> /opt/conda/lib/python3.8/site-packages/termextract
copying build/lib/termextract/english_postagger.py -> /opt/conda/lib/python3.8/site-packages/termextract
copying build/lib/termextract/mecab.py -> /opt/conda/lib/python3.8/site-packages/termextract
byte-compiling /opt/conda/lib/python3.8/site-packages/termextract/core.py to core.cpython-38.pyc
byte-compiling /opt/conda/lib/python3.8/site-packages/termextract/nlpir.py to nlpir.cpython-38.pyc
byte-compiling /opt/conda/lib/python3.8/site-packages/termextract/__init__.py to __init__.cpython-38.pyc
byte-compiling /opt/conda/lib/python3.8/site-packages/termextract/english_plaintext.py to english_plaintext.cpython-38.pyc
byte-compiling /opt/conda/lib/python3.8/site-packages/termextract/chinese_plaintext.py to chinese_plaintext.cpython-38.pyc
byte-compiling /opt/conda/lib/python3.8/site-packages/termextract/japanese_plaintext.py to japanese_plaintext.cpython-38.pyc
byte-compiling /opt/conda/lib/python3.8/site-packages/termextract/janome.py to janome.cpython-38.pyc
byte-compiling /opt/conda/lib/python3.8/site-packages/termextract/english_postagger.py to english_postagger.cpython-38.pyc
byte-compiling /opt/conda/lib/python3.8/site-packages/termextract/mecab.py to mecab.cpython-38.pyc
running install_egg_info
Writing /opt/conda/lib/python3.8/site-packages/termextract-0.12b-py3.8.egg-info

(base) root@75aa2a1ec996:/pytermextract-0_01# cd tests
(base) root@75aa2a1ec996:/pytermextract-0_01/tests# python -m unittest
.E.E....E
======================================================================
ERROR: test (test_eng.english_postagger)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/pytermextract-0_01/tests/test_eng.py", line 15, in test
    tagged_text = nltk.pos_tag(nltk.word_tokenize(text))
  File "/opt/conda/lib/python3.8/site-packages/nltk/tokenize/__init__.py", line 129, in word_tokenize
    sentences = [text] if preserve_line else sent_tokenize(text, language)
  File "/opt/conda/lib/python3.8/site-packages/nltk/tokenize/__init__.py", line 106, in sent_tokenize
    tokenizer = load("tokenizers/punkt/{0}.pickle".format(language))
  File "/opt/conda/lib/python3.8/site-packages/nltk/data.py", line 752, in load
    opened_resource = _open(resource_url)
  File "/opt/conda/lib/python3.8/site-packages/nltk/data.py", line 877, in _open
    return find(path_, path + [""]).open()
  File "/opt/conda/lib/python3.8/site-packages/nltk/data.py", line 585, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt/PY3/english.pickle

  Searched in:
    - '/root/nltk_data'
    - '/opt/conda/nltk_data'
    - '/opt/conda/share/nltk_data'
    - '/opt/conda/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************


======================================================================
ERROR: test (test_janome.janome)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/pytermextract-0_01/tests/test_janome.py", line 9, in test
    from janome.tokenizer import Tokenizer
ModuleNotFoundError: No module named 'janome'

======================================================================
ERROR: test (test_nlpir.nlpir)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/pytermextract-0_01/tests/test_nlpir.py", line 9, in test
    import pynlpir
ModuleNotFoundError: No module named 'pynlpir'

----------------------------------------------------------------------
Ran 9 tests in 6.276s

FAILED (errors=3)

じゃ

ubuntu
# python
Python 3.8.5 (default, Sep  4 2020, 07:30:14) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download('punkt')
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
True
>>> nltk.download('averaged_perceptron_tagger')
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
True
>>> exit

# python -m unittest
...E....E
======================================================================
ERROR: test (test_janome.janome)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/pytermextract-0_01/tests/test_janome.py", line 9, in test
    from janome.tokenizer import Tokenizer
ModuleNotFoundError: No module named 'janome'

======================================================================
ERROR: test (test_nlpir.nlpir)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/pytermextract-0_01/tests/test_nlpir.py", line 9, in test
    import pynlpir
ModuleNotFoundError: No module named 'pynlpir'

----------------------------------------------------------------------
Ran 9 tests in 6.164s

FAILED (errors=2)

 pip install janome
Collecting janome
  Downloading Janome-0.4.1-py2.py3-none-any.whl (19.7 MB)
     |████████████████████████████████| 19.7 MB 5.4 MB/s 
Installing collected packages: janome
Successfully installed janome-0.4.1
# python -m unittest
.........
----------------------------------------------------------------------
Ran 9 tests in 7.016s

OK

docker push失敗。

bash
$ docker commit 75aa2a1ec996 kaizenjapan/pytermextract
Error response from daemon: open /var/lib/docker/image/overlay2/layerdb/tmp/write-set-689207880/diff: structure needs cleaning

python - Python:複合語を既知の単語に分割(辞書から)
https://cloud6.net/so/python/2100473

最後までおよみいただきありがとうございました。

いいね 💚、フォローをお願いします。

Thank you very much for reading to the last sentence.

Please press the like icon 💚 and follow me for your happy life.

1
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?