More than 5 years have passed since last update.

PythonからMeCabを使うときのメモ

Last updated at 2016-02-10Posted at 2016-02-10

PythonからMeCabを使うときの自分用メモ。

mecab.py

#!/usr/bin/env python
# -*- coding:utf-8 -*-

import MeCab
m = MeCab.Tagger()

print m.parse("犬も歩けば棒に当たる。")

$ ./mecab.py
犬	名詞,一般,*,*,*,*,犬,イヌ,イヌ
も	助詞,係助詞,*,*,*,*,も,モ,モ
歩け	動詞,自立,*,*,五段・カ行イ音便,仮定形,歩く,アルケ,アルケ
ば	助詞,接続助詞,*,*,*,*,ば,バ,バ
棒	名詞,一般,*,*,*,*,棒,ボウ,ボー
に	助詞,格助詞,一般,*,*,*,に,ニ,ニ
当たる	動詞,自立,*,*,五段・ラ行,基本形,当たる,アタル,アタル
。	記号,句点,*,*,*,*,。,。,。
EOS

##ファイルから読み込み

mecab_from_file.py

#!/usr/bin/env python
# -*- coding:utf-8 -*-

import sys
param = sys.argv
infile = param[1]

f = open(infile)
line = f.readline() 

import MeCab
m = MeCab.Tagger()

while line:
	res = m.parseToNode(line)

	while res:
		print res.feature
		# 名詞,一般,*,*,*,*,犬,イヌ,イヌ

		res = res.next

	line = f.readline()

##ファイルから読み込み品詞頻度をカウント
要素をカウントするときはcollections.defaultdictを使うと楽。

mecab_class_count.py

#!/usr/bin/env python
# -*- coding:utf-8 -*-

import sys
param = sys.argv
infile = param[1]

f = open(infile)
line = f.readline() 

import MeCab
m = MeCab.Tagger()

from collections import defaultdict
frequency = defaultdict(int)

while line:
	res = m.parseToNode(line)

	while res:
		# print res.feature
		# 名詞,一般,*,*,*,*,犬,イヌ,イヌ
		
		arr = res.feature.split(",")
		class_1 = arr[0]
		frequency[class_1] += 1
		
		res = res.next

	line = f.readline()

# print frequency
# defaultdict(<type 'int'>, {'...

for k, v in frequency.iteritems():
    print k, v

$ ./mecab_morph_count.py input.txt
動詞 4
BOS/EOS 8
名詞 9
助詞 7
助動詞 1

##オプション

###辞書を指定

# MeCabインスタンス
m = MeCab.Tagger(' -d /usr/local/Cellar/mecab/0.996/lib/mecab/dic/mecab-ipadic-neologd')

###mecabrcを指定

m = MeCab.Tagger('-r my_mecabrc')

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up