More than 3 years have passed since last update.

MeCabを使ったキーワード抽出のサンプルコード

Last updated at 2022-02-14Posted at 2022-02-13

MeCabを使ったキーワード抽出のサンプルコードです。

ライブラリのインポート

import MeCab
import ipadic #IPA辞書
import collections #コンテナデータ型
import seaborn as sns #グラフ化
import matplotlib.pyplot as plt #グラフ化

文書ファイルの読込

エンコードを指定しないとエラーになる可能性があるらしい。

f = open('C://Python/genbaneko.txt' ,encoding="utf-8_sig")
text = f.read()
f.close()

MeCabで頻出単語数カウント

# MeCabで分割
m = MeCab.Tagger ('-Ochasen') #茶筅、形態素解析ツールのひとつ

node = m.parseToNode(text) #単語の品詞や詳細情報を取得
words=[]

# 単語の数カウント(Top10)
c = collections.Counter(words)
print(c.most_common(10))

while node:
    hinshi = node.feature.split(",")[0]
    if hinshi in ["名詞","動詞","形容詞"]:
        origin = node.feature.split(",")[6]
        words.append(origin)
    node = node.next

グラフ化

sns.set(context="talk")
fig = plt.subplots(figsize=(8, 8))
 
sns.countplot(y=words,order=[i[0] for i in c.most_common(10)])

参照元

MeCab公式
https://taku910.github.io/mecab/
【Python】形態素解析エンジン MeCabの使い方
https://hibiki-press.tech/python/mecab/5153#toc6

参考

YouTubeで「Pythonを使った事務処理の効率化」というタイトルでMeCabを紹介。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up