2
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

形態素解析:韓国語:その2:mecab-ko ユーザ辞書作成

Last updated at Posted at 2018-09-17

前回 の続き

ユーザ辞書の作成のために専用のシェルが用意されている。

1. ユーザ辞書編集

READMEに書いてある通り、
まずはユーザ辞書CSVに単語を追加。

  • user-dic/nnp.csv :固有名詞用
  • user-dic/person.csv :人名用
  • user-dic/place.csv :場所用

2. シェル実行

シェルの中を見ればわかるが、内部でmecab-dict-indexを実行している。

まず、mecab-koのインストールパスを確認して、パスが異なる場合は修正する。


- readonly MECAB_EXEC_PATH=/usr/local/libexec/mecab
+ readonly MECAB_EXEC_PATH=/usr/local/Cellar/mecab-ko/0.996-ko-0.9.2/libexec/mecab/

シェル実行

sudo ./tools/add-userdic.sh

【補足】
add-userdic.sh実行時に以下のようなエラーが発生する場合は、
coreutilsをインストールする。

generating userdic...
CoinedWord.csv
dictionary_compiler.cpp(82) [param.load(DCONF(DICRC))] no such file or directory: /../dicrc
EC.csv
dictionary_compiler.cpp(82) [param.load(DCONF(DICRC))] no such file or directory: /../dicrc
EF.csv

※coreutilsインストール

brew install coreutils

気を取り直して、再びユーザ辞書作成バッチ実行

$ sudo ./tools/add-userdic.sh 
path/tools
generating userdic...
nnp.csv
path/tools/../model.def is not a binary model. reopen it as text mode...
reading path/tools/../user-dic/nnp.csv ... 
done!
person.csv
path/tools/../model.def is not a binary model. reopen it as text mode...
reading path/tools/../user-dic/person.csv ... 
done!
place.csv
path/tools/../model.def is not a binary model. reopen it as text mode...
reading path/tools/../user-dic/place.csv ... 
done!
test -z "model.bin matrix.bin char.bin sys.dic unk.dic" || rm -f model.bin matrix.bin char.bin sys.dic unk.dic
/usr/local/Cellar/mecab-ko/0.996-ko-0.9.2/libexec/mecab/mecab-dict-index -d . -o . -f UTF-8 -t UTF-8
reading ./unk.def ... 13
emitting double-array: 100% |###########################################| 
reading ./CoinedWord.csv ... 148
reading ./EC.csv ... 2547
reading ./EF.csv ... 1820
reading ./EP.csv ... 51
reading ./ETM.csv ... 133
reading ./ETN.csv ... 14
reading ./Foreign.csv ... 11690
reading ./Group.csv ... 3176
reading ./Hanja.csv ... 125750
reading ./IC.csv ... 1305
reading ./Inflect.csv ... 44820
reading ./J.csv ... 416
reading ./MAG.csv ... 14242
reading ./MAJ.csv ... 240
reading ./MM.csv ... 453
reading ./NNB.csv ... 140
reading ./NNBC.csv ... 677
reading ./NNG.csv ... 208524
reading ./NNP.csv ... 2371
reading ./NorthKorea.csv ... 3
reading ./NP.csv ... 342
reading ./NR.csv ... 482
reading ./Person-actor.csv ... 99230
reading ./Person.csv ... 196459
reading ./Place-address.csv ... 19301
reading ./Place-station.csv ... 1145
reading ./Place.csv ... 30303
reading ./Preanalysis.csv ... 5
reading ./Symbol.csv ... 16
reading ./user-nnp.csv ... 3
reading ./user-person.csv ... 3
reading ./user-place.csv ... 2
reading ./VA.csv ... 2360
reading ./VCN.csv ... 7
reading ./VCP.csv ... 9
reading ./VV.csv ... 7331
reading ./VX.csv ... 125
reading ./Wikipedia.csv ... 36762
reading ./XPN.csv ... 83
reading ./XR.csv ... 3637
reading ./XSA.csv ... 19
reading ./XSN.csv ... 124
reading ./XSV.csv ... 23
emitting double-array: 100% |###########################################| 
reading ./matrix.def ... 3822x2693
emitting matrix      : 100% |###########################################| 

done!
echo To enable dictionary, rewrite /usr/local/etc/mecabrc as \"dicdir = /usr/local/lib/mecab/dic/mecab-ko-dic\"
To enable dictionary, rewrite /usr/local/etc/mecabrc as "dicdir = /usr/local/lib/mecab/dic/mecab-ko-dic"
$ sudo make install
make[1]: Nothing to be done for `install-exec-am'.
 ./install-sh -c -d '/usr/local/lib/mecab/dic/mecab-ko-dic'
 /usr/bin/install -c -m 644 model.bin matrix.bin char.bin sys.dic unk.dic left-id.def right-id.def rewrite.def pos-id.def dicrc '/usr/local/lib/mecab/dic/mecab-ko-dic'

ユーザ辞書にコストを"1"とかで設定しても形態素解析結果に反映されない場合は、
とりあえず元々の辞書(※1)の定義を削除すると、うまく反映された。
(→この辺はコストなどの設定方法をよく理解していないので、とりあえずやった感が強い)

※1 元々の辞書

mecab-ko-dicフォルダの直下のCSVファイルの事。
CSVファイルは品詞ごとなどで細く分けられている。

$ ls *.csv
CoinedWord.csv		IC.csv			NNP.csv			Preanalysis.csv		XR.csv
EC.csv			Inflect.csv		NP.csv			Symbol.csv		XSA.csv
EF.csv			J.csv			NR.csv			VA.csv			XSN.csv
EP.csv			MAG.csv			NorthKorea.csv		VCN.csv			XSV.csv
ETM.csv			MAJ.csv			Person-actor.csv	VCP.csv			user-nnp.csv
ETN.csv			MM.csv			Person.csv		VV.csv			user-person.csv
Foreign.csv		NNB.csv			Place-address.csv	VX.csv			user-place.csv
Group.csv		NNBC.csv		Place-station.csv	Wikipedia.csv
Hanja.csv		NNG.csv			Place.csv		XPN.csv
2
0
6

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?