dockerで機械学習(21) with anaconda(21)「Applied Text Analysis with Python」 By Benjamin Bengfort, Tony Ojeda, Rebecca Bilbro

Last updated at Posted at 2018-10-19

1.すぐに利用したい方へ(as soon as)

Applied Text Analysis with Python
Enabling Language-Aware Data Products with Machine Learning
By Benjamin Bengfort, Tony Ojeda, Rebecca Bilbro




dockerを導入し、Windows, Macではdockerを起動しておいてください。
Windowsでは、BiosでIntel Virtualizationをenableにしないとdockerが起動しない場合があります。

docker pull and run

$ docker pull kaizenjapan/anaconda-benjamin

$ docker run -it -p 8888:8888 kaizenjapan/anaconda-benjamin /bin/bash

以下のshell sessionでは
(base) root@f19e2f06eabb:/#は入力促進記号(comman prompt)です。実際には数字の部分が違うかもしれません。この行の#の右側を入力してください。

dockerの中と、dockerを起動したOSのシェルとが表示が似ている場合には、どちらで捜査しているか間違えることがあります。dockerの入力促進記号(comman prompt)に気をつけてください。





(base) root@caef766a99ff:/# ls
atap  bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var
(base) root@caef766a99ff:/# cd atap
(base) root@caef766a99ff:/atap# ls
LICENSE  README.md  resources  snippets
(base) root@caef766a99ff:/atap# cd snipperts
bash: cd: snipperts: No such file or directory
(base) root@caef766a99ff:/atap# ls
LICENSE  README.md  resources  snippets
(base) root@caef766a99ff:/atap# cd snippets/
(base) root@caef766a99ff:/atap/snippets# ls
ch01  ch02  ch03  ch04	ch05  ch06  ch07  ch08	ch09  ch10  ch11  ch12
(base) root@caef766a99ff:/atap/snippets# cd ch01
(base) root@caef766a99ff:/atap/snippets/ch01# ls
ballet.txt  gender.py  parse.py


(base) root@caef766a99ff:/atap/snippets/ch01# vi gender.py
(base) root@caef766a99ff:/atap/snippets/ch01# python gender.py
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
39.269% unknown (48 sentences)
52.994% female (38 sentences)
4.393% both (2 sentences)
3.344% male (3 sentences)


(base) root@caef766a99ff:/atap/snippets/ch01# python gender.py
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
Traceback (most recent call last):
  File "gender.py", line 80, in <module>
  File "gender.py", line 64, in parse_gender
    for sentence in nltk.sent_tokenize(text)
  File "/opt/conda/lib/python3.7/site-packages/nltk/tokenize/__init__.py", line 94, in sent_tokenize
    tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
  File "/opt/conda/lib/python3.7/site-packages/nltk/data.py", line 836, in load
    opened_resource = _open(resource_url)
  File "/opt/conda/lib/python3.7/site-packages/nltk/data.py", line 954, in _open
    return find(path_, path + ['']).open()
  File "/opt/conda/lib/python3.7/site-packages/nltk/data.py", line 675, in find
    raise LookupError(resource_not_found)
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')
  Searched in:
    - '/root/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/opt/conda/nltk_data'
    - '/opt/conda/share/nltk_data'
    - '/opt/conda/lib/nltk_data'
    - ''



#!/usr/bin/env python3
import nltk
from collections import Counter




(base) root@caef766a99ff:/atap/snippets/ch01# python parse.py
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
Traceback (most recent call last):
  File "parse.py", line 22, in <module>
  File "/opt/conda/lib/python3.7/site-packages/nltk/tree.py", line 690, in draw
  File "/opt/conda/lib/python3.7/site-packages/nltk/draw/tree.py", line 863, in draw_trees
  File "/opt/conda/lib/python3.7/site-packages/nltk/draw/tree.py", line 756, in __init__
    self._top = Tk()
  File "/opt/conda/lib/python3.7/tkinter/__init__.py", line 2020, in __init__
    self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
_tkinter.TclError: no display name and no $DISPLAY environment variable


import nltk
import matplotlib as mpl
import matplotlib.pyplot as plt


if __name__ == '__main__':
    for tree in parse("I put the book in the box on the table."):
        fig = plt.figure()


(base) root@caef766a99ff:/atap/snippets/ch01# python parse.py
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict



(base) root@caef766a99ff:/atap/snippets/ch02# python reader.py
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict



(base) root@caef766a99ff:/atap/snippets/ch03# python pos_tag.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
Traceback (most recent call last):
  File "pos_tag.py", line 12, in <module>
  File "pos_tag.py", line 9, in tokenize
    yield pos_tag(wordpunct_tokenize(sentence))
  File "/opt/conda/lib/python3.7/site-packages/nltk/tag/__init__.py", line 133, in pos_tag
    tagger = _get_tagger(lang)
  File "/opt/conda/lib/python3.7/site-packages/nltk/tag/__init__.py", line 97, in _get_tagger
    tagger = PerceptronTagger()
  File "/opt/conda/lib/python3.7/site-packages/nltk/tag/perceptron.py", line 140, in __init__
    AP_MODEL_LOC = 'file:'+str(find('taggers/averaged_perceptron_tagger/'+PICKLE))
  File "/opt/conda/lib/python3.7/site-packages/nltk/data.py", line 675, in find
    raise LookupError(resource_not_found)
  Resource averaged_perceptron_tagger not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('averaged_perceptron_tagger')
  Searched in:
    - '/root/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/opt/conda/nltk_data'
    - '/opt/conda/share/nltk_data'
    - '/opt/conda/lib/nltk_data'

1行追加 nltk.download

import nltk
from nltk import pos_tag, sent_tokenize, wordpunct_tokenize


(base) root@caef766a99ff:/atap/snippets/ch03# python pos_tag.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[[('The', 'DT'), ('old', 'JJ'), ('building', 'NN'), ('is', 'VBZ'), ('scheduled', 'VBN'), ('for', 'IN'), ('demolition', 'NN'), ('.', '.')], [('The', 'DT'), ('contractors', 'NNS'), ('will', 'MD'), ('begin', 'VB'), ('building', 'VBG'), ('a', 'DT'), ('new', 'JJ'), ('structure', 'NN'), ('next', 'JJ'), ('month', 'NN'), ('.', '.')]]



(base) root@caef766a99ff:/atap/snippets/ch04# python reader.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
Traceback (most recent call last):
  File "reader.py", line 88, in <module>
    corpus = PickledCorpusReader('../corpus')
  File "reader.py", line 28, in __init__
    CorpusReader.__init__(self, root, fileids)
  File "/opt/conda/lib/python3.7/site-packages/nltk/corpus/reader/api.py", line 84, in __init__
    root = FileSystemPathPointer(root)
  File "/opt/conda/lib/python3.7/site-packages/nltk/compat.py", line 221, in _decorator
    return init_func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/nltk/data.py", line 318, in __init__
    raise IOError('No such file or directory: %r' % _path)
OSError: No such file or directory: '/atap/snippets/corpus'
(base) root@caef766a99ff:/atap/snippets/ch04# ls
loader.py  reader.py  transformers.py  vectorization.py
(base) root@caef766a99ff:/atap/snippets/ch04# pwd
(base) root@caef766a99ff:/atap/snippets/ch04# ls ../
ch01  ch02  ch03  ch04	ch05  ch06  ch07  ch08	ch09  ch10  ch11  ch12
(base) root@caef766a99ff:/atap/snippets/ch04# ls 
loader.py  reader.py  transformers.py  vectorization.py
(base) root@caef766a99ff:/atap/snippets/ch04# cd ..
(base) root@caef766a99ff:/atap/snippets# mkdir corpus
(base) root@caef766a99ff:/atap/snippets# cd ch04
(base) root@caef766a99ff:/atap/snippets/ch04# ls
loader.py  reader.py  transformers.py  vectorization.py
(base) root@caef766a99ff:/atap/snippets/ch04# python reader.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
0 vocabulary 0 word count


(base) root@caef766a99ff:/atap/snippets/ch04# python loader.py
/opt/conda/lib/python3.7/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
Traceback (most recent call last):
  File "loader.py", line 58, in <module>
    corpus = PickledCorpusReader('corpus')
  File "/atap/snippets/ch04/reader.py", line 28, in __init__
    CorpusReader.__init__(self, root, fileids)
  File "/opt/conda/lib/python3.7/site-packages/nltk/corpus/reader/api.py", line 84, in __init__
    root = FileSystemPathPointer(root)
  File "/opt/conda/lib/python3.7/site-packages/nltk/compat.py", line 221, in _decorator
    return init_func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/nltk/data.py", line 318, in __init__
    raise IOError('No such file or directory: %r' % _path)
OSError: No such file or directory: '/atap/snippets/ch04/corpus'
(base) root@caef766a99ff:/atap/snippets/ch04# mkdir corpus
(base) root@caef766a99ff:/atap/snippets/ch04# python loader.py
/opt/conda/lib/python3.7/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
Traceback (most recent call last):
  File "loader.py", line 59, in <module>
    loader = CorpusLoader(corpus, 12)
  File "loader.py", line 12, in __init__
    self.folds = KFold(self.n_docs, folds, shuffle)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/cross_validation.py", line 337, in __init__
    super(KFold, self).__init__(n, n_folds, shuffle, random_state)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/cross_validation.py", line 262, in __init__
    " than the number of samples: {1}.").format(n_folds, n))
ValueError: Cannot have number of folds n_folds=12 greater than the number of samples: 0.


(base) root@caef766a99ff:/atap/snippets/ch04# python transformers.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
Traceback (most recent call last):
  File "transformers.py", line 5, in <module>
    import gensim
ModuleNotFoundError: No module named 'gensim'

(base) root@caef766a99ff:/atap/snippets/ch04# conda install gensim
Solving environment: done

## Package Plan ##

  environment location: /opt/conda

  added / updated specs: 
    - gensim

The following packages will be downloaded:

    package                    |            build
    s3transfer-0.1.13          |           py37_0          76 KB
    gensim-3.4.0               |   py37h14c3975_0        21.5 MB
    smart_open-1.7.1           |           py37_0          67 KB
    jmespath-0.9.3             |           py37_0          35 KB
    boto3-1.9.21               |           py37_0         108 KB
    botocore-1.12.23           |           py37_0         3.2 MB
                                           Total:        25.0 MB

The following NEW packages will be INSTALLED:

    boto3:      1.9.21-py37_0       
    botocore:   1.12.23-py37_0      
    gensim:     3.4.0-py37h14c3975_0
    jmespath:   0.9.3-py37_0        
    s3transfer: 0.1.13-py37_0       
    smart_open: 1.7.1-py37_0        

Proceed ([y]/n)? y

Downloading and Extracting Packages
s3transfer-0.1.13    | 76 KB     | ######################################################################################################################################## | 100% 
gensim-3.4.0         | 21.5 MB   | ######################################################################################################################################## | 100% 
smart_open-1.7.1     | 67 KB     | ######################################################################################################################################## | 100% 
jmespath-0.9.3       | 35 KB     | ######################################################################################################################################## | 100% 
boto3-1.9.21         | 108 KB    | ######################################################################################################################################## | 100% 
botocore-1.12.23     | 3.2 MB    | ######################################################################################################################################## | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
(base) root@caef766a99ff:/atap/snippets/ch04# python transformers.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
/opt/conda/lib/python3.7/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
Traceback (most recent call last):
  File "transformers.py", line 84, in <module>
    loader = CorpusLoader(corpus, 12)
  File "/atap/snippets/ch04/loader.py", line 12, in __init__
    self.folds = KFold(self.n_docs, folds, shuffle)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/cross_validation.py", line 337, in __init__
    super(KFold, self).__init__(n, n_folds, shuffle, random_state)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/cross_validation.py", line 262, in __init__
    " than the number of samples: {1}.").format(n_folds, n))
ValueError: Cannot have number of folds n_folds=12 greater than the number of samples: 0.


(base) root@caef766a99ff:/atap/snippets/ch04# python vectorization.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
/opt/conda/lib/python3.7/site-packages/gensim/models/doc2vec.py:366: UserWarning: The parameter `size` is deprecated, will be removed in 4.0.0, use `vector_size` instead.
  warnings.warn("The parameter `size` is deprecated, will be removed in 4.0.0, use `vector_size` instead.")
[-0.08896858  0.04988472  0.06680734 -0.08234943 -0.01591597]


(base) root@caef766a99ff:/atap/snippets/ch04# cd ../ch05
(base) root@caef766a99ff:/atap/snippets/ch05# ls
bias_variance.py  build.py  evaluate.py  info.py  loader.py  ner.py  reader.py	results.py  splits.py
(base) root@caef766a99ff:/atap/snippets/ch05# python loader.py
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
(base) root@caef766a99ff:/atap/snippets/ch05# cat loader.py
import numpy as np

from sklearn.model_selection import KFold
from sklearn.model_selection import train_test_split as tts

class CorpusLoader(object):

    def __init__(self, reader, folds=12, shuffle=True, categories=None):
        self.reader = reader
        self.folds  = KFold(n_splits=folds, shuffle=shuffle)
        self.files  = np.asarray(self.reader.fileids(categories=categories))

    def fileids(self, idx=None):
        if idx is None:
            return self.files
        return self.files[idx]

    def documents(self, idx=None):
        for fileid in self.fileids(idx):
            yield list(self.reader.docs(fileids=[fileid]))

    def labels(self, idx=None):
        return [
            for fileid in self.fileids(idx)

    def __iter__(self):
        for train_index, test_index in self.folds.split(self.files):
            X_train = self.documents(train_index)
            y_train = self.labels(train_index)

            X_test = self.documents(test_index)
            y_test = self.labels(test_index)

            yield X_train, X_test, y_train, y_test

if __name__ == '__main__':
    from reader import PickledCorpusReader

    corpus = PickledCorpusReader('../corpus')
    loader = CorpusLoader(corpus, 12)
(base) root@caef766a99ff:/atap/snippets/ch05# 
(base) root@caef766a99ff:/atap/snippets/ch05# conda update

CondaValueError: no package names supplied
# If you want to update to a newer version of Anaconda, type:
# $ conda update --prefix /opt/conda anaconda

(base) root@caef766a99ff:/atap/snippets/ch05# conda update --prefix /opt/conda anaconda
Solving environment: done

# All requested packages already installed.

(base) root@caef766a99ff:/atap/snippets/ch05# 
(base) root@caef766a99ff:/atap/snippets/ch05# ls
__pycache__  bias_variance.py  build.py  evaluate.py  info.py  loader.py  ner.py  reader.py  results.py  splits.py
(base) root@caef766a99ff:/atap/snippets/ch05# python loader.py
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
(base) root@caef766a99ff:/atap/snippets/ch05# python reader.py
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
0 vocabulary 0 word count
(base) root@caef766a99ff:/atap/snippets/ch05# python results.py
Traceback (most recent call last):
  File "results.py", line 2, in <module>
    import tabulate
ModuleNotFoundError: No module named 'tabulate'
(base) root@caef766a99ff:/atap/snippets/ch05# conda install tabulate
Solving environment: done

## Package Plan ##

  environment location: /opt/conda

  added / updated specs: 
    - tabulate

The following packages will be downloaded:

    package                    |            build
    tabulate-0.8.2             |           py37_0          36 KB

The following NEW packages will be INSTALLED:

    tabulate: 0.8.2-py37_0

Proceed ([y]/n)? y

Downloading and Extracting Packages
tabulate-0.8.2       | 36 KB     | ######################################################################################################################################## | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
(base) root@caef766a99ff:/atap/snippets/ch05# python results.py
Traceback (most recent call last):
  File "results.py", line 8, in <module>
    with open('results.json', 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'results.json'
(base) root@caef766a99ff:/atap/snippets/ch05# ls
__pycache__  bias_variance.py  build.py  evaluate.py  info.py  loader.py  ner.py  reader.py  results.py  splits.py
(base) root@caef766a99ff:/atap/snippets/ch05# python bias_variance.py 
Traceback (most recent call last):
  File "bias_variance.py", line 5, in <module>
    import seaborn as sns
  File "/opt/conda/lib/python3.7/site-packages/seaborn/__init__.py", line 6, in <module>
    from .rcmod import *
  File "/opt/conda/lib/python3.7/site-packages/seaborn/rcmod.py", line 5, in <module>
    from . import palettes, _orig_rc_params
  File "/opt/conda/lib/python3.7/site-packages/seaborn/palettes.py", line 12, in <module>
    from .utils import desaturate, set_hls_values, get_color_cycle
  File "/opt/conda/lib/python3.7/site-packages/seaborn/utils.py", line 11, in <module>
    import matplotlib.pyplot as plt
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/pyplot.py", line 115, in <module>
    _backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup()
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/backends/__init__.py", line 62, in pylab_setup
    [backend_name], 0)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/backends/backend_qt5agg.py", line 15, in <module>
    from .backend_qt5 import (
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/backends/backend_qt5.py", line 19, in <module>
    import matplotlib.backends.qt_editor.figureoptions as figureoptions
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/backends/qt_editor/figureoptions.py", line 20, in <module>
    import matplotlib.backends.qt_editor.formlayout as formlayout
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/backends/qt_editor/formlayout.py", line 54, in <module>
    from matplotlib.backends.qt_compat import QtGui, QtWidgets, QtCore
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/backends/qt_compat.py", line 158, in <module>
    raise ImportError("Failed to import any qt binding")
ImportError: Failed to import any qt binding
(base) root@caef766a99ff:/atap/snippets/ch05# conda install qt
Solving environment: done

# All requested packages already installed.

(base) root@caef766a99ff:/atap/snippets/ch05# conda install QtCore
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - qtcore

Current channels:

  - https://repo.anaconda.com/pkgs/main/linux-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/free/linux-64
  - https://repo.anaconda.com/pkgs/free/noarch
  - https://repo.anaconda.com/pkgs/r/linux-64
  - https://repo.anaconda.com/pkgs/r/noarch
  - https://repo.anaconda.com/pkgs/pro/linux-64
  - https://repo.anaconda.com/pkgs/pro/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to


and use the search bar at the top of the page.

(base) root@caef766a99ff:/atap/snippets/ch05# pip install QtGui
Collecting QtGui
  Downloading https://files.pythonhosted.org/packages/01/3a/fbc802c50f2db85fff637388e3fed3a17f5c848febe6815ef29d13c60e00/qtgui-0.0.1.tar.gz
Building wheels for collected packages: QtGui
  Running setup.py bdist_wheel for QtGui ... done
  Stored in directory: /root/.cache/pip/wheels/dc/24/19/968e6c14da845bd653d59c2de3bd550c0d636afb15b53020ed
Successfully built QtGui
Installing collected packages: QtGui
Successfully installed QtGui-0.0.1
(base) root@caef766a99ff:/atap/snippets/ch05# pip install QtCore
Collecting QtCore
  Could not find a version that satisfies the requirement QtCore (from versions: )
No matching distribution found for QtCore

(base) root@caef766a99ff:/atap/snippets/ch05# pip install pyqt5
Collecting pyqt5
  Downloading https://files.pythonhosted.org/packages/d4/bf/d884da8e2f7096d201c891d515eb6813a8e85df5eb6f5e12e867bf1d831c/PyQt5-5.11.3-5.11.2-cp35.cp36.cp37.cp38-abi3-manylinux1_x86_64.whl (117.8MB)
    100% |████████████████████████████████| 117.8MB 462kB/s 
Collecting PyQt5_sip<4.20,>=4.19.11 (from pyqt5)
  Downloading https://files.pythonhosted.org/packages/2b/9b/37e4f07ddac00e7eff4dd216c330a66cb1221e9c510855055391b779ee77/PyQt5_sip-4.19.13-cp37-cp37m-manylinux1_x86_64.whl (67kB)
    100% |████████████████████████████████| 71kB 2.0MB/s 
Installing collected packages: PyQt5-sip, pyqt5
Successfully installed PyQt5-sip-4.19.13 pyqt5-5.11.3
(base) root@caef766a99ff:/atap/snippets/ch05# python bias_variance.py 
Traceback (most recent call last):
  File "bias_variance.py", line 5, in <module>
    import seaborn as sns
  File "/opt/conda/lib/python3.7/site-packages/seaborn/__init__.py", line 6, in <module>
    from .rcmod import *
  File "/opt/conda/lib/python3.7/site-packages/seaborn/rcmod.py", line 5, in <module>
    from . import palettes, _orig_rc_params
  File "/opt/conda/lib/python3.7/site-packages/seaborn/palettes.py", line 12, in <module>
    from .utils import desaturate, set_hls_values, get_color_cycle
  File "/opt/conda/lib/python3.7/site-packages/seaborn/utils.py", line 11, in <module>
    import matplotlib.pyplot as plt
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/pyplot.py", line 115, in <module>
    _backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup()
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/backends/__init__.py", line 62, in pylab_setup
    [backend_name], 0)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/backends/backend_qt5agg.py", line 15, in <module>
    from .backend_qt5 import (
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/backends/backend_qt5.py", line 19, in <module>
    import matplotlib.backends.qt_editor.figureoptions as figureoptions
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/backends/qt_editor/figureoptions.py", line 20, in <module>
    import matplotlib.backends.qt_editor.formlayout as formlayout
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/backends/qt_editor/formlayout.py", line 54, in <module>
    from matplotlib.backends.qt_compat import QtGui, QtWidgets, QtCore
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/backends/qt_compat.py", line 158, in <module>
    raise ImportError("Failed to import any qt binding")
ImportError: Failed to import any qt binding

(base) root@caef766a99ff:/atap/snippets/ch05# pip uninstall matplotlib
Uninstalling matplotlib-2.2.3:
  Would remove:
Proceed (y/n)? y
  Successfully uninstalled matplotlib-2.2.3

(base) root@caef766a99ff:/atap/snippets/ch05# python bias_variance.py 
Traceback (most recent call last):
  File "bias_variance.py", line 5, in <module>
    import seaborn as sns
  File "/opt/conda/lib/python3.7/site-packages/seaborn/__init__.py", line 2, in <module>
    import matplotlib as mpl
ModuleNotFoundError: No module named 'matplotlib'

(base) root@caef766a99ff:/atap/snippets/ch05# pip install matplotlib
Collecting matplotlib
  Downloading https://files.pythonhosted.org/packages/1e/92/923b86132669ce39b7b0096a402cc78a5b70f22423f8b59bbd7bb7ff9403/matplotlib-3.0.0-cp37-cp37m-manylinux1_x86_64.whl (12.8MB)
    100% |████████████████████████████████| 12.8MB 3.4MB/s 
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib) (2.2.0)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.7/site-packages (from matplotlib) (0.10.0)
Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib) (2.7.3)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib) (1.0.1)
Requirement already satisfied: numpy>=1.10.0 in /opt/conda/lib/python3.7/site-packages (from matplotlib) (1.15.1)
Requirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from cycler>=0.10->matplotlib) (1.11.0)
Requirement already satisfied: setuptools in /opt/conda/lib/python3.7/site-packages (from kiwisolver>=1.0.1->matplotlib) (40.2.0)
Installing collected packages: matplotlib
Successfully installed matplotlib-3.0.0
(base) root@caef766a99ff:/atap/snippets/ch05# python bias_variance.py 
Traceback (most recent call last):
  File "bias_variance.py", line 38, in <module>
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/pyplot.py", line 688, in savefig
    res = fig.savefig(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/figure.py", line 2097, in savefig
    self.canvas.print_figure(fname, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/backend_bases.py", line 2075, in print_figure
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/backends/backend_agg.py", line 521, in print_png
    cbook.open_file_cm(filename_or_obj, "wb") as fh:
  File "/opt/conda/lib/python3.7/contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/cbook/__init__.py", line 407, in open_file_cm
    fh, opened = to_filehandle(path_or_file, mode, True, encoding)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/cbook/__init__.py", line 392, in to_filehandle
    fh = open(fname, flag, encoding=encoding)
FileNotFoundError: [Errno 2] No such file or directory: '../../images/ch04/atap_ch04_bias_variance_tradeoff.png'


(base) root@caef766a99ff:/atap/snippets/ch05# ls
__pycache__  bias_variance.py  build.py  evaluate.py  info.py  loader.py  ner.py  reader.py  results.py  splits.py
(base) root@caef766a99ff:/atap/snippets/ch05# python build.py
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/nltk/corpus/util.py", line 80, in __load
    try: root = nltk.data.find('{}/{}'.format(self.subdir, zip_name))
  File "/opt/conda/lib/python3.7/site-packages/nltk/data.py", line 675, in find
    raise LookupError(resource_not_found)
  Resource stopwords not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('stopwords')
  Searched in:
    - '/root/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/opt/conda/nltk_data'
    - '/opt/conda/share/nltk_data'
    - '/opt/conda/lib/nltk_data'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "build.py", line 91, in <module>
    models.append(create_pipeline(form(), True))
  File "build.py", line 69, in create_pipeline
    ('normalize', TextNormalizer()),
  File "build.py", line 28, in __init__
    self.stopwords  = set(nltk.corpus.stopwords.words(language))
  File "/opt/conda/lib/python3.7/site-packages/nltk/corpus/util.py", line 116, in __getattr__
  File "/opt/conda/lib/python3.7/site-packages/nltk/corpus/util.py", line 81, in __load
    except LookupError: raise e
  File "/opt/conda/lib/python3.7/site-packages/nltk/corpus/util.py", line 78, in __load
    root = nltk.data.find('{}/{}'.format(self.subdir, self.__name))
  File "/opt/conda/lib/python3.7/site-packages/nltk/data.py", line 675, in find
    raise LookupError(resource_not_found)
  Resource stopwords not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('stopwords')
  Searched in:
    - '/root/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/opt/conda/nltk_data'
    - '/opt/conda/share/nltk_data'
    - '/opt/conda/lib/nltk_data'


(base) root@caef766a99ff:/atap/snippets/ch05# vi build.py
(base) root@caef766a99ff:/atap/snippets/ch05# python build.py
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
Traceback (most recent call last):
  File "build.py", line 133, in <module>
    for scores in score_models(models, loader):
  File "build.py", line 119, in score_models
    for X_train, X_test, y_train, y_test in loader:
  File "/atap/snippets/ch05/loader.py", line 29, in __iter__
    for train_index, test_index in self.folds.split(self.files):
  File "/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_split.py", line 330, in split
ValueError: Cannot have number of splits n_splits=5 greater than the number of samples: 0.


(base) root@caef766a99ff:/atap/snippets/ch05# python results.py
Traceback (most recent call last):
  File "results.py", line 8, in <module>
    with open('results.json', 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'results.json'


Successfully built corpus
Installing collected packages: corpus
Successfully installed corpus-0.4.2
(base) root@caef766a99ff:/atap/snippets/ch06# python agglomerative.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
Traceback (most recent call last):
  File "agglomerative.py", line 12, in <module>
    from corpus import HTMLPickledCorpusReader
ModuleNotFoundError: No module named 'corpus'
(base) root@caef766a99ff:/atap/snippets/ch06# python agglomerative.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
Traceback (most recent call last):
  File "agglomerative.py", line 12, in <module>
    from corpus import HTMLPickledCorpusReader
ModuleNotFoundError: No module named 'corpus'
(base) root@caef766a99ff:/atap/snippets/ch06# pip install corpus
Requirement already satisfied: corpus in /opt/conda/lib/python3.7/site-packages (0.4.2)
(base) root@caef766a99ff:/atap/snippets/ch06# ls
agglomerative.py  kmeans.py  reader.py	topics.py  transformers.py
(base) root@caef766a99ff:/atap/snippets/ch06# python kmeans.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
Traceback (most recent call last):
  File "kmeans.py", line 99, in <module>
  File "kmeans.py", line 86, in cluster
    ) for fileid in corpus.fileids(categories=['news'])
  File "/opt/conda/lib/python3.7/site-packages/nltk/cluster/util.py", line 43, in cluster
    assert len(vectors) > 0
(base) root@caef766a99ff:/atap/snippets/ch06# python reader.py
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
0 vocabulary 0 word count
(base) root@caef766a99ff:/atap/snippets/ch06# python topics.py
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
Traceback (most recent call last):
  File "topics.py", line 100, in <module>
  File "topics.py", line 43, in fit_transform
  File "/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py", line 281, in fit_transform
    Xt, fit_params = self._fit(X, y, **fit_params)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py", line 213, in _fit
  File "/opt/conda/lib/python3.7/site-packages/sklearn/externals/joblib/memory.py", line 362, in __call__
    return self.func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py", line 581, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py", line 869, in fit_transform
  File "/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py", line 811, in _count_vocab
    raise ValueError("empty vocabulary; perhaps the documents only"
ValueError: empty vocabulary; perhaps the documents only contain stop words
(base) root@caef766a99ff:/atap/snippets/ch06# python transformers.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
Traceback (most recent call last):
  File "transformers.py", line 122, in <module>
  File "/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py", line 283, in fit_transform
    return last_step.fit_transform(Xt, y, **fit_params)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/base.py", line 517, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File "/opt/conda/lib/python3.7/site-packages/gensim/sklearn_api/ldamodel.py", line 71, in fit
    random_state=self.random_state, dtype=self.dtype
  File "/opt/conda/lib/python3.7/site-packages/gensim/models/ldamodel.py", line 293, in __init__
    raise ValueError("cannot compute LDA over an empty collection (no terms)")
ValueError: cannot compute LDA over an empty collection (no terms)
(base) root@caef766a99ff:/atap/snippets/ch06# cd ..
(base) root@caef766a99ff:/atap/snippets# cd ch07
(base) root@caef766a99ff:/atap/snippets/ch07# ls
collocation.py	grammar.py  model.py  ngrams.py  reader.py  transformers.py
(base) root@caef766a99ff:/atap/snippets/ch07# python collocation.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
(base) root@caef766a99ff:/atap/snippets/ch07# python grammar.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
Grammar with 13 productions (start state = S)
    S -> NNP VP
    VP -> V PP
    PP -> P NP
    NP -> DT N
    NNP -> 'Gwen'
    NNP -> 'George'
    V -> 'looks'
    V -> 'burns'
    P -> 'in'
    P -> 'for'
    DT -> 'the'
    N -> 'castle'
    N -> 'ocean'
[S -> NNP VP, VP -> V PP, PP -> P NP, NP -> DT N, NNP -> 'Gwen', NNP -> 'George', V -> 'looks', V -> 'burns', P -> 'in', P -> 'for', DT -> 'the', N -> 'castle', N -> 'ocean']
(base) root@caef766a99ff:/atap/snippets/ch07# python model.py
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
Can we talk about something else?
Can we talk about something else?
(base) root@caef766a99ff:/atap/snippets/ch07# python ngram.py
(null): can't open file 'ngram.py': [Errno 2] No such file or directory
(base) root@caef766a99ff:/atap/snippets/ch07# python ngrams.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
usage: ngrams.py [-h] [--nltk] [-n N] phrase
ngrams.py: error: the following arguments are required: phrase
(base) root@caef766a99ff:/atap/snippets/ch07# python ngrams.py  hello
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
(base) root@caef766a99ff:/atap/snippets/ch07# python reader.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
0 vocabulary 0 word count
(base) root@caef766a99ff:/atap/snippets/ch07# python transformers.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
Traceback (most recent call last):
  File "transformers.py", line 96, in <module>
IndexError: list index out of range
(base) root@caef766a99ff:/atap/snippets/ch07# cd ../ch08
(base) root@caef766a99ff:/atap/snippets/ch08# ls
classification.py  data  elbows_silhouettes.py	freqdist.py  normalize.py  oz.py  postag.py  reader.py	text.py  timeseries.py	tsne.py
(base) root@caef766a99ff:/atap/snippets/ch08# python classification.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
/opt/conda/lib/python3.7/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d
Traceback (most recent call last):
  File "classification.py", line 10, in <module>
    from yellowbrick.classifier import ConfusionMatrix
ModuleNotFoundError: No module named 'yellowbrick'
(base) root@caef766a99ff:/atap/snippets/ch08# conda install yellowbrick
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - yellowbrick

Current channels:

  - https://repo.anaconda.com/pkgs/main/linux-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/free/linux-64
  - https://repo.anaconda.com/pkgs/free/noarch
  - https://repo.anaconda.com/pkgs/r/linux-64
  - https://repo.anaconda.com/pkgs/r/noarch
  - https://repo.anaconda.com/pkgs/pro/linux-64
  - https://repo.anaconda.com/pkgs/pro/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to


and use the search bar at the top of the page.

(base) root@caef766a99ff:/atap/snippets/ch08# pip install yellowbrick
Collecting yellowbrick
  Downloading https://files.pythonhosted.org/packages/ca/64/ffa3ae377d0841595335f9ae402ecd777d7a275746c7389feb68c1386110/yellowbrick-0.8-py2.py3-none-any.whl (241kB)
    100% |████████████████████████████████| 245kB 879kB/s 
Requirement already satisfied: cycler>=0.10.0 in /opt/conda/lib/python3.7/site-packages (from yellowbrick) (0.10.0)
Requirement already satisfied: numpy>=1.13.0 in /opt/conda/lib/python3.7/site-packages (from yellowbrick) (1.15.1)
Requirement already satisfied: matplotlib>=1.5.1 in /opt/conda/lib/python3.7/site-packages (from yellowbrick) (3.0.0)
Requirement already satisfied: scikit-learn>=0.19 in /opt/conda/lib/python3.7/site-packages (from yellowbrick) (0.19.2)
Requirement already satisfied: scipy>=0.19 in /opt/conda/lib/python3.7/site-packages (from yellowbrick) (1.1.0)
Requirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from cycler>=0.10.0->yellowbrick) (1.11.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=1.5.1->yellowbrick) (2.2.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=1.5.1->yellowbrick) (1.0.1)
Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=1.5.1->yellowbrick) (2.7.3)
Requirement already satisfied: setuptools in /opt/conda/lib/python3.7/site-packages (from kiwisolver>=1.0.1->matplotlib>=1.5.1->yellowbrick) (40.2.0)
Installing collected packages: yellowbrick
Successfully installed yellowbrick-0.8
(base) root@caef766a99ff:/atap/snippets/ch08# python classification.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
/opt/conda/lib/python3.7/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d
Traceback (most recent call last):
  File "classification.py", line 71, in <module>
    corpus = load_corpus('hobbies')
  File "classification.py", line 40, in load_corpus
ValueError: 'hobbies' dataset has not been downloaded, use the download.py module to fetch datasets
(base) root@caef766a99ff:/atap/snippets/ch08# ls
classification.py  data  elbows_silhouettes.py	freqdist.py  normalize.py  oz.py  postag.py  reader.py	text.py  timeseries.py	tsne.py
(base) root@caef766a99ff:/atap/snippets/ch08# python elbows_silhouettes.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
Traceback (most recent call last):
  File "elbows_silhouettes.py", line 63, in <module>
    corpus = load_corpus('hobbies')
  File "elbows_silhouettes.py", line 34, in load_corpus
ValueError: 'hobbies' dataset has not been downloaded, use the download.py module to fetch datasets
(base) root@caef766a99ff:/atap/snippets/ch08# python freqdist.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
Traceback (most recent call last):
  File "freqdist.py", line 64, in <module>
    corpus = load_corpus('hobbies')
  File "freqdist.py", line 34, in load_corpus
ValueError: 'hobbies' dataset has not been downloaded, use the download.py module to fetch datasets
(base) root@caef766a99ff:/atap/snippets/ch08# python normalize.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
(base) root@caef766a99ff:/atap/snippets/ch08# python oz.py 
(base) root@caef766a99ff:/atap/snippets/ch08# python postag.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
In a small saucepan , combine sugar and eggs until well blended . Cook over low heat , stirring constantly , until mixture reaches 160° and coats the back of a metal spoon . Remove from the heat . Stir in chocolate and vanilla until smooth . Cool to lukewarm ( 90° ) , stirring occasionally . In a small bowl , cream butter until light and fluffy . Add cooled chocolate mixture ; beat on high speed for 5 minutes or until light and fluffy . In another large bowl , beat cream until it begins to thicken . Add confectioners ' sugar ; beat until stiff peaks form . Fold into chocolate mixture . Pour into crust . Chill for at least 6 hours before serving . Garnish with whipped cream and chocolate curls if desired .

Baa , baa , black sheep , Have you any wool ? Yes , sir , yes , sir , Three bags full ; One for the master , And one for the dame , And one for the little boy Who lives down the lane .

(base) root@caef766a99ff:/atap/snippets/ch08# python reader.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
(base) root@caef766a99ff:/atap/snippets/ch08# python text.py 
/opt/conda/lib/python3.7/site-packages/matplotlib/figure.py:448: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure.
  % get_backend())
(base) root@caef766a99ff:/atap/snippets/ch08# python text.py
/opt/conda/lib/python3.7/site-packages/matplotlib/figure.py:448: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure.
  % get_backend())
(base) root@caef766a99ff:/atap/snippets/ch08# python timeseries.py 
Traceback (most recent call last):
  File "timeseries.py", line 150, in <module>
  File "timeseries.py", line 97, in read
    pubdates = load_pubdates(corpus.fileids(), pubdates)
  File "timeseries.py", line 42, in load_pubdates
    with open(path, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'pubdates.csv'
(base) root@caef766a99ff:/atap/snippets/ch08# python tsne.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
Traceback (most recent call last):
  File "tsne.py", line 63, in <module>
    corpus = load_corpus('hobbies')
  File "tsne.py", line 32, in load_corpus
ValueError: 'hobbies' dataset has not been downloaded, use the download.py module to fetch datasets
(base) root@caef766a99ff:/atap/snippets/ch08# cd ../ch09
(base) root@caef766a99ff:/atap/snippets/ch09# ls
entities.py  graph.py  reader.py  resolve.py  syngraph.py
(base) root@caef766a99ff:/atap/snippets/ch09# python entities.py 
Traceback (most recent call last):
  File "entities.py", line 1, in <module>
    import spacy
ModuleNotFoundError: No module named 'spacy'
(base) root@caef766a99ff:/atap/snippets/ch09# pip install spacy
Collecting spacy
  Downloading https://files.pythonhosted.org/packages/46/88/0a4d9ba7f66e9304054e5b5ea6150be84029fb24283028703d088fdd0c96/spacy-2.0.16-cp37-cp37m-manylinux1_x86_64.whl (23.3MB)
    100% |████████████████████████████████| 23.3MB 2.3MB/s 
Collecting murmurhash<1.1.0,>=0.28.0 (from spacy)
  Downloading https://files.pythonhosted.org/packages/3c/15/37a8153aa33a3feb9d6383205c470b2750714d016b34d9f00e0d0c262af2/murmurhash-1.0.1-cp37-cp37m-manylinux1_x86_64.whl
Collecting msgpack-numpy<0.4.4 (from spacy)
  Downloading https://files.pythonhosted.org/packages/ad/45/464be6da85b5ca893cfcbd5de3b31a6710f636ccb8521b17bd4110a08d94/msgpack_numpy-
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /opt/conda/lib/python3.7/site-packages (from spacy) (2.19.1)
Collecting regex==2018.01.10 (from spacy)
  Downloading https://files.pythonhosted.org/packages/76/f4/7146c3812f96fcaaf2d06ff6862582302626a59011ccb6f2833bb38d80f7/regex-2018.01.10.tar.gz (612kB)
    100% |████████████████████████████████| 614kB 15.7MB/s 
Collecting thinc<6.13.0,>=6.12.0 (from spacy)
  Downloading https://files.pythonhosted.org/packages/f8/50/a7f6f4167dc29951e387fea5756bae79fe444e38193ee84530c964201240/thinc-6.12.0-cp37-cp37m-manylinux1_x86_64.whl (1.9MB)
    100% |████████████████████████████████| 1.9MB 12.5MB/s 
Collecting plac<1.0.0,>=0.9.6 (from spacy)
  Downloading https://files.pythonhosted.org/packages/9e/9b/62c60d2f5bc135d2aa1d8c8a86aaf84edb719a59c7f11a4316259e61a298/plac-0.9.6-py2.py3-none-any.whl
Collecting preshed<2.1.0,>=2.0.1 (from spacy)
  Downloading https://files.pythonhosted.org/packages/bc/2b/3ecd5d90d2d6fd39fbc520de7d80db5d74defdc2d7c2e15531d9cc3498c7/preshed-2.0.1-cp37-cp37m-manylinux1_x86_64.whl (82kB)
    100% |████████████████████████████████| 92kB 10.9MB/s 
Collecting dill<0.3,>=0.2 (from spacy)
  Downloading https://files.pythonhosted.org/packages/6f/78/8b96476f4ae426db71c6e86a8e6a81407f015b34547e442291cd397b18f3/dill- (150kB)
    100% |████████████████████████████████| 153kB 13.4MB/s 
Requirement already satisfied: numpy>=1.15.0 in /opt/conda/lib/python3.7/site-packages (from spacy) (1.15.1)
Collecting ujson>=1.35 (from spacy)
  Downloading https://files.pythonhosted.org/packages/16/c4/79f3409bc710559015464e5f49b9879430d8f87498ecdc335899732e5377/ujson-1.35.tar.gz (192kB)
    100% |████████████████████████████████| 194kB 7.6MB/s 
Collecting cymem<2.1.0,>=2.0.2 (from spacy)
  Downloading https://files.pythonhosted.org/packages/65/26/e534148e509cbebbea3ee29f50f59eb206621d12c35e4594507da8dc54cc/cymem-2.0.2-cp37-cp37m-manylinux1_x86_64.whl
Requirement already satisfied: msgpack>=0.3.0 in /opt/conda/lib/python3.7/site-packages (from msgpack-numpy<0.4.4->spacy) (0.5.6)
Requirement already satisfied: urllib3<1.24,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy) (1.23)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy) (2018.8.24)
Requirement already satisfied: idna<2.8,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy) (2.7)
Requirement already satisfied: tqdm<5.0.0,>=4.10.0 in /opt/conda/lib/python3.7/site-packages (from thinc<6.13.0,>=6.12.0->spacy) (4.26.0)
Requirement already satisfied: wrapt<1.11.0,>=1.10.0 in /opt/conda/lib/python3.7/site-packages (from thinc<6.13.0,>=6.12.0->spacy) (1.10.11)
Requirement already satisfied: cytoolz<0.10,>=0.9.0 in /opt/conda/lib/python3.7/site-packages (from thinc<6.13.0,>=6.12.0->spacy) (
Requirement already satisfied: six<2.0.0,>=1.10.0 in /opt/conda/lib/python3.7/site-packages (from thinc<6.13.0,>=6.12.0->spacy) (1.11.0)
Requirement already satisfied: toolz>=0.8.0 in /opt/conda/lib/python3.7/site-packages (from cytoolz<0.10,>=0.9.0->thinc<6.13.0,>=6.12.0->spacy) (0.9.0)
Building wheels for collected packages: regex, dill, ujson
  Running setup.py bdist_wheel for regex ... error
  Complete output from command /opt/conda/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-l401ygag/regex/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/pip-wheel-qh8ba9er --python-tag cp37:
  /opt/conda/lib/python3.7/site-packages/setuptools/dist.py:398: UserWarning: Normalizing '2018.01.10' to '2018.1.10'
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-3.7
  copying regex_3/regex.py -> build/lib.linux-x86_64-3.7
  copying regex_3/_regex_core.py -> build/lib.linux-x86_64-3.7
  copying regex_3/test_regex.py -> build/lib.linux-x86_64-3.7
  running build_ext
  building '_regex' extension
  creating build/temp.linux-x86_64-3.7
  creating build/temp.linux-x86_64-3.7/regex_3
  gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/opt/conda/include/python3.7m -c regex_3/_regex.c -o build/temp.linux-x86_64-3.7/regex_3/_regex.o
  unable to execute 'gcc': No such file or directory
  error: command 'gcc' failed with exit status 1
  Failed building wheel for regex
  Running setup.py clean for regex
  Running setup.py bdist_wheel for dill ... done
  Stored in directory: /root/.cache/pip/wheels/e2/5d/17/f87cb7751896ac629b435a8696f83ee75b11029f5d6f6bda72
  Running setup.py bdist_wheel for ujson ... error
  Complete output from command /opt/conda/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-l401ygag/ujson/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/pip-wheel-ej0wzq7l --python-tag cp37:
  Warning: 'classifiers' should be a list, got type 'filter'
  running bdist_wheel
  running build
  running build_ext
  building 'ujson' extension
  creating build
  creating build/temp.linux-x86_64-3.7
  creating build/temp.linux-x86_64-3.7/python
  creating build/temp.linux-x86_64-3.7/lib
  gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./python -I./lib -I/opt/conda/include/python3.7m -c ./python/ujson.c -o build/temp.linux-x86_64-3.7/./python/ujson.o -D_GNU_SOURCE
  unable to execute 'gcc': No such file or directory
  error: command 'gcc' failed with exit status 1
  Failed building wheel for ujson
  Running setup.py clean for ujson
Successfully built dill
Failed to build regex ujson
Installing collected packages: murmurhash, msgpack-numpy, regex, dill, cymem, preshed, plac, thinc, ujson, spacy
  Running setup.py install for regex ... error
    Complete output from command /opt/conda/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-l401ygag/regex/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-ul7r4m1v/install-record.txt --single-version-externally-managed --compile:
    /opt/conda/lib/python3.7/site-packages/setuptools/dist.py:398: UserWarning: Normalizing '2018.01.10' to '2018.1.10'
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.7
    copying regex_3/regex.py -> build/lib.linux-x86_64-3.7
    copying regex_3/_regex_core.py -> build/lib.linux-x86_64-3.7
    copying regex_3/test_regex.py -> build/lib.linux-x86_64-3.7
    running build_ext
    building '_regex' extension
    creating build/temp.linux-x86_64-3.7
    creating build/temp.linux-x86_64-3.7/regex_3
    gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/opt/conda/include/python3.7m -c regex_3/_regex.c -o build/temp.linux-x86_64-3.7/regex_3/_regex.o
    unable to execute 'gcc': No such file or directory
    error: command 'gcc' failed with exit status 1
Command "/opt/conda/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-l401ygag/regex/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-ul7r4m1v/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-install-l401ygag/regex/
(base) root@caef766a99ff:/atap/snippets/ch09# conda  install spacy
Solving environment: done

## Package Plan ##

  environment location: /opt/conda

  added / updated specs: 
    - spacy

The following packages will be downloaded:

    package                    |            build
    plac-0.9.6                 |           py37_0          36 KB
    msgpack-numpy-      |           py37_0          14 KB
    ujson-1.35                 |   py37h14c3975_0          26 KB
    thinc-6.12.0               |   py37h4989274_0         1.6 MB
    murmurhash-1.0.1           |   py37he6710b0_0          19 KB
    cymem-2.0.2                |   py37hfd86e86_0          39 KB
    dill-               |           py37_0         112 KB
    preshed-2.0.1              |   py37he6710b0_0          84 KB
    spacy-2.0.16               |   py37h962f231_0        47.4 MB
    regex-2018.08.29           |   py37h7b6447c_0         348 KB
                                           Total:        49.7 MB

The following NEW packages will be INSTALLED:

    cymem:         2.0.2-py37hfd86e86_0     
    murmurhash:    1.0.1-py37he6710b0_0     
    plac:          0.9.6-py37_0             
    preshed:       2.0.1-py37he6710b0_0     
    regex:         2018.08.29-py37h7b6447c_0
    spacy:         2.0.16-py37h962f231_0    
    thinc:         6.12.0-py37h4989274_0    
    ujson:         1.35-py37h14c3975_0      

Proceed ([y]/n)? y

Downloading and Extracting Packages
plac-0.9.6           | 36 KB     | ################################################################################################################################################## | 100% 
msgpack-numpy-0.4.3. | 14 KB     | ################################################################################################################################################## | 100% 
ujson-1.35           | 26 KB     | ################################################################################################################################################## | 100% 
thinc-6.12.0         | 1.6 MB    | ################################################################################################################################################## | 100% 
murmurhash-1.0.1     | 19 KB     | ################################################################################################################################################## | 100% 
cymem-2.0.2          | 39 KB     | ################################################################################################################################################## | 100% 
dill-         | 112 KB    | ################################################################################################################################################## | 100% 
preshed-2.0.1        | 84 KB     | ################################################################################################################################################## | 100% 
spacy-2.0.16         | 47.4 MB   | ################################################################################################################################################## | 100% 
regex-2018.08.29     | 348 KB    | ################################################################################################################################################## | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
(base) root@caef766a99ff:/atap/snippets/ch09# python entities.py 
Traceback (most recent call last):
  File "entities.py", line 10, in <module>
    nlp = spacy.load('en')
  File "/opt/conda/lib/python3.7/site-packages/spacy/__init__.py", line 21, in load
    return util.load_model(name, **overrides)
  File "/opt/conda/lib/python3.7/site-packages/spacy/util.py", line 119, in load_model
    raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
(base) root@caef766a99ff:/atap/snippets/ch09# ls
__pycache__  entities.py  graph.py  reader.py  resolve.py  syngraph.py
(base) root@caef766a99ff:/atap/snippets/ch09# python graph.py
Traceback (most recent call last):
  File "graph.py", line 11, in <module>
    from entities import pairs
  File "/atap/snippets/ch09/entities.py", line 10, in <module>
    nlp = spacy.load('en')
  File "/opt/conda/lib/python3.7/site-packages/spacy/__init__.py", line 21, in load
    return util.load_model(name, **overrides)
  File "/opt/conda/lib/python3.7/site-packages/spacy/util.py", line 119, in load_model
    raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

model 'en'

(base) root@caef766a99ff:/atap/snippets/ch10# python -m spacy download en
Collecting en_core_web_sm==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm==2.0.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz (37.4MB)
    100% |████████████████████████████████| 37.4MB 176kB/s 
Installing collected packages: en-core-web-sm
  Running setup.py install for en-core-web-sm ... done
Successfully installed en-core-web-sm-2.0.0

    Linking successful
    /opt/conda/lib/python3.7/site-packages/en_core_web_sm -->

    You can now load the model via spacy.load('en')
(base) root@caef766a99ff:/atap/snippets/ch09# python graph.py 
Traceback (most recent call last):
  File "graph.py", line 71, in <module>
    G = graph(corpus)
  File "graph.py", line 21, in graph
    G.add_nodes_from([feed['title'] for feed in corpus.feeds()], type='feed')
  File "/atap/snippets/ch09/reader.py", line 52, in feeds
    data = self.open('feeds.json')
  File "/opt/conda/lib/python3.7/site-packages/nltk/corpus/reader/api.py", line 213, in open
    stream = self._root.join(file).open(encoding)
  File "/opt/conda/lib/python3.7/site-packages/nltk/data.py", line 340, in join
    return FileSystemPathPointer(_path)
  File "/opt/conda/lib/python3.7/site-packages/nltk/compat.py", line 221, in _decorator
    return init_func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/nltk/data.py", line 318, in __init__
    raise IOError('No such file or directory: %r' % _path)
OSError: No such file or directory: '/atap/snippets/corpus/feeds.json'
(base) root@caef766a99ff:/atap/snippets/ch09# python reader.py
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
Traceback (most recent call last):
  File "reader.py", line 111, in <module>
    print(list(' '.join(word for word, tag in sent) for sent in news)[100:200])
  File "reader.py", line 111, in <genexpr>
    print(list(' '.join(word for word, tag in sent) for sent in news)[100:200])
  File "reader.py", line 83, in sents
    for paragraph in self.paras(fileids, categories):
  File "reader.py", line 74, in paras
    for doc in self.docs(fileids, categories):
  File "reader.py", line 62, in docs
    fileids = self._resolve(fileids, categories)
  File "reader.py", line 48, in _resolve
    return self.fileids(categories)
  File "/opt/conda/lib/python3.7/site-packages/nltk/corpus/reader/api.py", line 361, in fileids
    raise ValueError('Category %s not found' % categories)
ValueError: Category news not found
(base) root@caef766a99ff:/atap/snippets/ch09# python resolve.py
Traceback (most recent call last):
  File "resolve.py", line 2, in <module>
    from fuzzywuzzy import fuzz
ModuleNotFoundError: No module named 'fuzzywuzzy'

(base) root@caef766a99ff:/atap/snippets/ch09# pip install fuzzywuzzy
Collecting fuzzywuzzy
  Downloading https://files.pythonhosted.org/packages/d8/f1/5a267addb30ab7eaa1beab2b9323073815da4551076554ecc890a3595ec9/fuzzywuzzy-0.17.0-py2.py3-none-any.whl
Installing collected packages: fuzzywuzzy
Successfully installed fuzzywuzzy-0.17.0
(base) root@caef766a99ff:/atap/snippets/ch09# python resolve.py
/opt/conda/lib/python3.7/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
  warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
/opt/conda/lib/python3.7/site-packages/networkx/drawing/nx_pylab.py:611: MatplotlibDeprecationWarning: isinstance(..., numbers.Number)
  if cb.is_numlike(alpha):
/opt/conda/lib/python3.7/site-packages/matplotlib/figure.py:448: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure.
  % get_backend())
Name: Hilton Family
Type: Graph
Number of nodes: 18
Number of edges: 17
Average degree:   1.8889
Number of Pairwise Comparisons: 153
Number of Edge Blocked Comparisons: 32
Number of Fuzzy Blocked Comparisons: 20



(base) root@caef766a99ff:/atap/snippets/ch09# ls
__pycache__  entities.py  graph.py  reader.py  resolve.py  syngraph.py

(base) root@caef766a99ff:/atap/snippets/ch09# python syngraph.py
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/nltk/corpus/util.py", line 80, in __load
    try: root = nltk.data.find('{}/{}'.format(self.subdir, zip_name))
  File "/opt/conda/lib/python3.7/site-packages/nltk/data.py", line 675, in find
    raise LookupError(resource_not_found)
  Resource wordnet not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('wordnet')
  Searched in:
    - '/root/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/opt/conda/nltk_data'
    - '/opt/conda/share/nltk_data'
    - '/opt/conda/lib/nltk_data'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "syngraph.py", line 11, in <module>
    def graph_synsets(terms, pos=wn.NOUN, depth=2):
  File "/opt/conda/lib/python3.7/site-packages/nltk/corpus/util.py", line 116, in __getattr__
  File "/opt/conda/lib/python3.7/site-packages/nltk/corpus/util.py", line 81, in __load
    except LookupError: raise e
  File "/opt/conda/lib/python3.7/site-packages/nltk/corpus/util.py", line 78, in __load
    root = nltk.data.find('{}/{}'.format(self.subdir, self.__name))
  File "/opt/conda/lib/python3.7/site-packages/nltk/data.py", line 675, in find
    raise LookupError(resource_not_found)
  Resource wordnet not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('wordnet')
  Searched in:
    - '/root/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/opt/conda/nltk_data'
    - '/opt/conda/share/nltk_data'
    - '/opt/conda/lib/nltk_data'

(base) root@caef766a99ff:/atap/snippets/ch09# vi syngraph.py

add 2 lines to syngraph.py

import nltk
(base) root@caef766a99ff:/atap/snippets/ch09# python syngraph.py
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.
usage: syngraph.py [-h] [-d DEPTH] [-o OUTPATH] [-p POS] words [words ...]
syngraph.py: error: the following arguments are required: words
(base) root@caef766a99ff:/atap/snippets/ch09# python syngraph.py words
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
/opt/conda/lib/python3.7/site-packages/networkx/drawing/nx_pylab.py:611: MatplotlibDeprecationWarning: isinstance(..., numbers.Number)
  if cb.is_numlike(alpha):
/opt/conda/lib/python3.7/site-packages/matplotlib/figure.py:448: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure.
  % get_backend())
Name: WordNet Synsets Graph for words
Type: Graph
Number of nodes: 458
Number of edges: 851
Average degree:   3.7162
(base) root@caef766a99ff:/atap/snippets/ch09# cd ../ch10
(base) root@caef766a99ff:/atap/snippets/ch10# ls
app.py	conversions.json  converter.py	parser.py  preprocessor.py  reader.py  recommender.py  templates  transformer.py
(base) root@caef766a99ff:/atap/snippets/ch10# python app.py
Traceback (most recent call last):
  File "app.py", line 7, in <module>
    import inflect
ModuleNotFoundError: No module named 'inflect'

(base) root@caef766a99ff:/atap/snippets/ch10# pip install inflect
Collecting inflect
  Downloading https://files.pythonhosted.org/packages/6e/1b/6b9b48323b714b5f66dbea2bd5d4166c4f99d908bc31d5307d14083aa9a2/inflect-1.0.1-py2.py3-none-any.whl (59kB)
    100% |████████████████████████████████| 61kB 2.7MB/s 
Installing collected packages: inflect
Successfully installed inflect-1.0.1

(base) root@caef766a99ff:/atap/snippets/ch10# python app.py
Traceback (most recent call last):
  File "app.py", line 12, in <module>
    from recommender import suggest_recipe, KNNTransformer, KNNRecommender
  File "/atap/snippets/ch10/recommender.py", line 19, in <module>
    from reader import HTMLPickledCorpusReader
  File "/atap/snippets/ch10/reader.py", line 15, in <module>
    from readability.readability import Document as Paper
ModuleNotFoundError: No module named 'readability'

(base) root@caef766a99ff:/atap/snippets/ch10# pip install readability
Collecting readability
  Downloading https://files.pythonhosted.org/packages/4c/cc/de564319ff609a445056c13409b726256e75fd2f32b07c03c39fc59ca07c/readability-0.3.tar.gz
Building wheels for collected packages: readability
  Running setup.py bdist_wheel for readability ... done
  Stored in directory: /root/.cache/pip/wheels/fc/75/22/46cb1e2fed3dc5bfd0280b79ad317df5823387f5c7a744c750
Successfully built readability
Installing collected packages: readability
Successfully installed readability-0.3

(base) root@caef766a99ff:/atap/snippets/ch10# python app.py
Traceback (most recent call last):
  File "app.py", line 12, in <module>
    from recommender import suggest_recipe, KNNTransformer, KNNRecommender
  File "/atap/snippets/ch10/recommender.py", line 19, in <module>
    from reader import HTMLPickledCorpusReader
  File "/atap/snippets/ch10/reader.py", line 15, in <module>
    from readability.readability import Document as Paper
ModuleNotFoundError: No module named 'readability.readability'

pip install readability-lxml

(base) root@caef766a99ff:/atap/snippets/ch10# pip install readability-lxml
Collecting readability-lxml
  Downloading https://files.pythonhosted.org/packages/b0/7c/807b783c1e7f9c2e3f86573f644771112813e9ee94c1d610811c7acc7562/readability-lxml-0.7.tar.gz
Requirement already satisfied: chardet in /opt/conda/lib/python3.7/site-packages (from readability-lxml) (3.0.4)
Requirement already satisfied: lxml in /opt/conda/lib/python3.7/site-packages (from readability-lxml) (4.2.5)
Collecting cssselect (from readability-lxml)
  Downloading https://files.pythonhosted.org/packages/7b/44/25b7283e50585f0b4156960691d951b05d061abf4a714078393e51929b30/cssselect-1.0.3-py2.py3-none-any.whl
Building wheels for collected packages: readability-lxml
  Running setup.py bdist_wheel for readability-lxml ... done
  Stored in directory: /root/.cache/pip/wheels/7f/af/cb/6169c4eeb36ec80893c8df26472f2525460289914cdefc6dbc
Successfully built readability-lxml
Installing collected packages: cssselect, readability-lxml
Successfully installed cssselect-1.0.3 readability-lxml-0.7


(base) root@caef766a99ff:/atap/snippets/ch10# python preprocessor.py
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
Traceback (most recent call last):
  File "preprocessor.py", line 299, in <module>
    corpus = HTMLCorpusReader('../mini_food_corpus')
  File "/atap/snippets/ch10/reader.py", line 49, in __init__
    CorpusReader.__init__(self, root, fileids, encoding)
  File "/opt/conda/lib/python3.7/site-packages/nltk/corpus/reader/api.py", line 84, in __init__
    root = FileSystemPathPointer(root)
  File "/opt/conda/lib/python3.7/site-packages/nltk/compat.py", line 221, in _decorator
    return init_func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/nltk/data.py", line 318, in __init__
    raise IOError('No such file or directory: %r' % _path)
OSError: No such file or directory: '/atap/snippets/mini_food_corpus'
(base) root@caef766a99ff:/atap/snippets/ch10# cd ..
(base) root@caef766a99ff:/atap/snippets# ls
ch01  ch02  ch03  ch04	ch05  ch06  ch07  ch08	ch09  ch10  ch11  ch12	corpus
(base) root@caef766a99ff:/atap/snippets# mkdir mini_food_corpus


(base) root@caef766a99ff:/atap/snippets/ch10# python recommender.py ingredients
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
Traceback (most recent call last):
  File "recommender.py", line 230, in <module>
    recs, build_time = suggest_recipe(args.ingredients)
  File "recommender.py", line 68, in wrapper
    result = func(*args, **kwargs)
  File "recommender.py", line 221, in suggest_recipe
  File "recommender.py", line 192, in fit_transform
    self.lexicon = self.transformer.fit_transform(documents)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py", line 283, in fit_transform
    return last_step.fit_transform(Xt, y, **fit_params)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py", line 281, in fit_transform
    Xt, fit_params = self._fit(X, y, **fit_params)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py", line 213, in _fit
  File "/opt/conda/lib/python3.7/site-packages/sklearn/externals/joblib/memory.py", line 362, in __call__
    return self.func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py", line 581, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py", line 1381, in fit_transform
    X = super(TfidfVectorizer, self).fit_transform(raw_documents)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py", line 869, in fit_transform
  File "/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py", line 811, in _count_vocab
    raise ValueError("empty vocabulary; perhaps the documents only"
ValueError: empty vocabulary; perhaps the documents only contain stop words
(base) root@caef766a99ff:/atap/snippets/ch10# 



(base) root@caef766a99ff:/atap/snippets/ch11# python sc_bigramcount.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
Traceback (most recent call last):
  File "sc_bigramcount.py", line 10, in <module>
    from pyspark.sql import SparkSession
ModuleNotFoundError: No module named 'pyspark'
(base) root@caef766a99ff:/atap/snippets/ch11# conda install pyspark
Solving environment: done

## Package Plan ##

  environment location: /opt/conda

  added / updated specs: 
    - pyspark

The following packages will be downloaded:

    package                    |            build
    py4j-0.10.7                |           py37_0         251 KB
    pyspark-2.3.2              |           py37_0       202.2 MB
                                           Total:       202.5 MB

The following NEW packages will be INSTALLED:

    py4j:    0.10.7-py37_0
    pyspark: 2.3.2-py37_0 

Proceed ([y]/n)? y

Downloading and Extracting Packages
py4j-0.10.7          | 251 KB    | ######################################################################################################################################## | 100% 
pyspark-2.3.2        | 202.2 MB  | ######################################################################################################################################## | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
(base) root@caef766a99ff:/atap/snippets/ch11# python sc_bigramcount.py 
/opt/conda/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
JAVA_HOME is not set
Traceback (most recent call last):
  File "sc_bigramcount.py", line 66, in <module>
    sc    = SparkContext(conf=conf)
  File "/opt/conda/lib/python3.7/site-packages/pyspark/context.py", line 115, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
  File "/opt/conda/lib/python3.7/site-packages/pyspark/context.py", line 300, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway(conf)
  File "/opt/conda/lib/python3.7/site-packages/pyspark/java_gateway.py", line 93, in launch_gateway
    raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number



3. dockerを自力で構築する方へ


ubuntu, debianなどのLinuxを、linux, windows, mac osから共通に利用できる仕組み。






ubuntu, debianなどのOSの公式配布,gcc, anacondaなどの言語の公式配布などがある。

docker pull


docker Anaconda


$ docker run -it -p 8888:8888 continuumio/anaconda3 /bin/bash


(base) root@d8857ae56e69:/# apt update

(base) root@d8857ae56e69:/# apt install -y procps vim apt-utils sudo

ソース git

(base) root@f19e2f06eabb:/# git clone https://github.com/foxbook/atap


# conda update --prefix /opt/conda anaconda
Solving environment: done

# All requested packages already installed.

# conda install gensim


(base) root@f19e2f06eabb:/deep-learning-from-scratch-2/ch01# pip install --upgrade pip
Collecting pip
  Downloading https://files.pythonhosted.org/packages/5f/25/e52d3f31441505a5f3af41213346e5b6c221c9e086a166f3703d2ddaf940/pip-18.0-py2.py3-none-any.whl (1.3MB)
    100% |████████████████████████████████| 1.3MB 2.0MB/s 
distributed 1.21.8 requires msgpack, which is not installed.
Installing collected packages: pip
  Found existing installation: pip 10.0.1
    Uninstalling pip-10.0.1:
      Successfully uninstalled pip-10.0.1
Successfully installed pip-18.0

docker hubへの登録

$ docker ps
CONTAINER ID        IMAGE                   COMMAND                  CREATED             STATUS              PORTS                    NAMES
caef766a99ff        continuumio/anaconda3   "/usr/bin/tini -- /b…"   10 hours ago        Up 10 hours>8888/tcp   sleepy_bassi

$ docker commit caef766a99ff kaizenjapan/anaconda-benjamin

$ docker push kaizenjapan/anaconda-benjamin
The push refers to repository [docker.io/kaizenjapan/anaconda-benjamin]
97b3faf8c2d5: Pushed 
d2dd625b0ad2: Mounted from continuumio/anaconda3 
e53e98dd84af: Mounted from continuumio/anaconda3 
641d40d58695: Mounted from continuumio/anaconda3 
b28ef0b6fef8: Mounted from kaizenjapan/anaconda3-wei 
latest: digest: sha256:35c77fc6ddd9951b13622ca1ead87bfd3bd6621eca21b71b87b32d7b7c4e1559 size: 1379


dockerで機械学習(python:anaconda)「直感Deep Learning」Antonio Gulli、Sujit Pal 第1章,第2章




Docker for Mac でファイル共有を利用する

「名古屋のIoTは名古屋のOSで」Dockerをどっかーらどうやって使えばいいんでしょう。TOPPERS/FMP on RaspberryPi with Macintosh編 5つの関門

64bitCPUへの道 and/or 64歳の決意

ゼロから作るDeepLearning2自然言語処理編 読書会の進め方(例)

Ubuntu 16.04 LTS で NVIDIA Docker を使ってみる

readability.readability import Document
ImportError: No module named readability

OSError: Can't find model 'en' #1

Ethernet 記事一覧 Ethernet(0)

Wireshark 一覧 wireshark(0)、Ethernet(48)

線網(Wi-Fi)空中線(antenna)(0) 記事一覧(118/300目標)

C++ Support(0) 

Coding Rules(0) C Secure , MISRA and so on

Autosar Guidelines C++14 example code compile list(1-169)

Error一覧(C/C++, python, bash...) Error(0)

なぜdockerで機械学習するか 書籍・ソース一覧作成中 (目標100)



一覧の一覧( The directory of directories of mine.) Qiita(100)



LaTeX(0) 一覧 


Rust(0) 一覧 

小川清最終講義、最終講義(再)計画, Ethernet(100) 英語(100) 安全(100)

###文書履歴(document history)
ver. 0.10 初稿 20181018 昼
ver. 0.11 誤記訂正 20181018 午後3時
ver. 0.12 警告のソース例示 20181018 午後4時
ver. 0.13 ch01:gender.py, nltk.download('punkt') 追記 20181019
ver. 0.14 ch03:transformer.py, conda install gensim 追記 20181020 朝
ver. 0.15 docker push 20181020 午前
ver. 0.16 readability.readability Error 対応 20181020 昼
ver. 0.17 model en 対応 20181020 午後
ver. 0.18 ありがとう追記 20230413


いいね 💚、フォローをお願いします。

Thank you very much for reading to the last sentence.

Please press the like icon 💚 and follow me for your happy life.


