More than 5 years have passed since last update.

MeCabで「動詞」「名詞」「形容詞」を抽出 + 「動詞」「形容詞」を基本形に変換

Last updated at 2019-01-10Posted at 2019-01-10

はじめに

自然言語処理の前処理において行う「動詞」「名詞」「形容詞」の抽出と「動詞」「形容詞」を基本形に変換をMeCabで処理します．

MeCabのインストール

[こちら](https://qiita.com/taroc/items/b9afd914432da08dafc8)の記事を参考にすると良いでしょう．

プログラム

こちらがプログラムになります． 2行目で空文字列をparseしてますがこれは[node.surfaceが取得できないバグ](https://qiita.com/piruty/items/ce218090eae53b775b79)を回避するための処理です．

tokenizer = MeCab.Tagger("-Ochasen -d /usr/local/lib/mecab/dic/mecab-ipadic-neologd")
tokenizer.parse("")
node = tokenizer.parseToNode(document)
keywords = []
while node:
	if node.feature.split(",")[0] == u"名詞":
		keywords.append(node.surface)
	elif node.feature.split(",")[0] == u"形容詞":
		keywords.append(node.feature.split(",")[6])
	elif node.feature.split(",")[0] == u"動詞":
		keywords.append(node.feature.split(",")[6])
	node = node.next

実行結果

"MeCabを使って自然言語処理をしたい"


['MeCab', '使う', '自然言語処理', 'する']

まとめ

「動詞」「名詞」「形容詞」の抽出と「動詞」「形容詞」の基本形に変換ができました．抽出する単語は各自カスタマイズして良い自然言語処理ライフを楽しんでください．

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up