10
12

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

MeCabでユーザ辞書を使用する

Posted at

1.ユーザ辞書を作成する
(1)csvファイルを作って語を定義する

sample_userdic.csv
G20,1288,1288,8461,名詞,固有名詞,一般,*,*,*,G20,ジートゥウェンティ,ジートゥウェンティ

(2)ディレクトリを作って配置する

$ sudo mkdir /usr/local/lib/mecab/dic/sample_userdic
$ sudo cp sample_userdic.csv /usr/local/lib/mecab/dic/sample_userdic

2.コンパイルする(この場合はシステム辞書に新語辞書を使用)

$ cd /usr/local/lib/mecab/dic/sample_userdic
$ sudo /usr/local/libexec/mecab/mecab-dict-index -d/usr/local/lib/mecab/dic/mecab-ipadic-neologd -u sample_userdic.dic -f utf-8 -t utf-8 sample_userdic.csv
reading sample_userdic.csv ... 1
emitting double-array: 100% |###########################################| 

done!
$ ls
sample_userdic.csv  sample_userdic.dic

3.確認する(この場合は引数として指定、mecabrcでも指定できる)
◇ コマンドから確認

$ mecab -d "/usr/lib/mecab/dic/mecab-ipadic-neologd/" -u "/usr/lib/mecab/dic/userdic/userdic.dic"
20カ国・地域(G20)
2	名詞,数,*,*,*,*,2,ニ,ニ
0	名詞,数,*,*,*,*,0,ゼロ,ゼロ
カ国	名詞,接尾,助数詞,*,*,*,カ国,カコク,カコク
・	記号,一般,*,*,*,*,・,・,・
地域	名詞,一般,*,*,*,*,地域,チイキ,チイキ
(	記号,括弧開,*,*,*,*,(,(,(
G20	名詞,固有名詞,一般,*,*,*,G20,ジートゥエンティー,ジートゥエンティー
)	記号,括弧閉,*,*,*,*,),),)

◇ pythonから確認

sample.py
>>> import MeCab
>>> mecab = MeCab.Tagger('-d /usr/local/lib/mecab/dic/mecab-ipadic-neologd -u /usr/local/lib/mecab/dic/userdic/userdic.dic')
>>> strG20 = "20カ国・地域(G20)"
>>> line = mecab.parse(strG20)
>>> word = parsed.split('\n')
>>> word[6]
'G20\t名詞,固有名詞,一般,*,*,*,G20,ジートゥエンティー,ジートゥエンティー'
10
12
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
10
12

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?