More than 5 years have passed since last update.

Pythonでお手軽かつ速めなngram

Posted at 2018-01-30

言語処理100本ノックなど、NLPの基礎的な勉強でngramを実装する場面は多かろうと思います。
zip 関数を使うととても簡単に、結構速い実装ができそうなので比べてみました。

import this
# zen of python が出てくるが無視して良い（適宜眺める）

# 昔思いついたやつ・似たような例はよく見かける
def ngram_(words, n):
    return [tuple(words[i:i+n]) for i in range(len(words)-n+1)]

# 最近思いついたやつ
def ngram(words, n):
    return list(zip(*(words[i:] for i in range(n))))

words = (this.s * 100).split() # 単語の配列

%timeit ngram_(words, 2)
# => 6 ms ± 211 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit ngram(words, 2)
# => 818 µs ± 42.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

比較はipythonで適当にやってみましたが、速そうに見えますね。
個人的にはテスト用のテキストが欲しくて呼んだthisモジュールのソースの方が面白かったです（参考：Zen of Pythonの核心に触れよ -- thisでわかるPythonのimportの仕組み）

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up