LoginSignup
1
0

More than 5 years have passed since last update.

【Python】word2vecの類義語を品詞でフィルタリングする

Last updated at Posted at 2018-09-22

前提

  • modelは学習済みのgensimのmodel
  • 品詞判定にはJanomeのTokenizerを利用
  • '鬼'に類似した単語1,00個を取得してからフィルタリング

TokenFilterを使う方法

from janome.analyzer import Analyzer
from janome.tokenfilter import POSKeepFilter, POSStopFilter

target_word = '鬼'
topn = 1000

for word, score in model.similar(target_word, topn=topn):
    tokens = a.analyze(word)
    for token in tokens:
        print(token, score)

TokenFilterの利用イメージ

from janome.analyzer import Analyzer
from janome.tokenfilter import POSKeepFilter, POSStopFilter

target_word = '「鬼の」'

# 「名詞」を取得して、「記号」と「助詞」は除く
token_filters = [POSKeepFilter('名詞'), POSStopFilter(['記号', '助詞'])]
a = Analyzer(token_filters=token_filters)
tokens = a.analyze(target_word)
for token in tokens:
    print(token)
#=> 鬼 名詞,一般,*,*,*,*,鬼,オニ,オニ

TokenFilterを使わない方法

from janome.tokenizer import Tokenizer

t = Tokenizer()
target_word = '鬼'
topn = 1000

for word, score in model.similar(target_word, topn=topn):
    tokens = t.tokenize(word)
    for token in tokens:
        pos0 = token.part_of_speech.split(',')[0]
        pos1 = token.part_of_speech.split(',')[1]
        # 形容詞と形容動詞に限定して取得
        if pos0 == '形容詞' or pos1 == '形容動詞':
            print(token.surface, score, pos0, pos1)

品詞について

1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0