LoginSignup
2
0

More than 5 years have passed since last update.

形態素解析:韓国語:その2:mecab-ko ユーザ辞書作成

Last updated at Posted at 2018-09-17

前回 の続き

ユーザ辞書の作成のために専用のシェルが用意されている。

1. ユーザ辞書編集

READMEに書いてある通り、
まずはユーザ辞書CSVに単語を追加。
- user-dic/nnp.csv :固有名詞用
- user-dic/person.csv :人名用
- user-dic/place.csv :場所用

2. シェル実行

シェルの中を見ればわかるが、内部でmecab-dict-indexを実行している。

まず、mecab-koのインストールパスを確認して、パスが異なる場合は修正する。


- readonly MECAB_EXEC_PATH=/usr/local/libexec/mecab
+ readonly MECAB_EXEC_PATH=/usr/local/Cellar/mecab-ko/0.996-ko-0.9.2/libexec/mecab/

シェル実行

sudo ./tools/add-userdic.sh

【補足】
add-userdic.sh実行時に以下のようなエラーが発生する場合は、
coreutilsをインストールする。

generating userdic...
CoinedWord.csv
dictionary_compiler.cpp(82) [param.load(DCONF(DICRC))] no such file or directory: /../dicrc
EC.csv
dictionary_compiler.cpp(82) [param.load(DCONF(DICRC))] no such file or directory: /../dicrc
EF.csv

※coreutilsインストール

brew install coreutils

気を取り直して、再びユーザ辞書作成バッチ実行

$ sudo ./tools/add-userdic.sh 
path/tools
generating userdic...
nnp.csv
path/tools/../model.def is not a binary model. reopen it as text mode...
reading path/tools/../user-dic/nnp.csv ... 
done!
person.csv
path/tools/../model.def is not a binary model. reopen it as text mode...
reading path/tools/../user-dic/person.csv ... 
done!
place.csv
path/tools/../model.def is not a binary model. reopen it as text mode...
reading path/tools/../user-dic/place.csv ... 
done!
test -z "model.bin matrix.bin char.bin sys.dic unk.dic" || rm -f model.bin matrix.bin char.bin sys.dic unk.dic
/usr/local/Cellar/mecab-ko/0.996-ko-0.9.2/libexec/mecab/mecab-dict-index -d . -o . -f UTF-8 -t UTF-8
reading ./unk.def ... 13
emitting double-array: 100% |###########################################| 
reading ./CoinedWord.csv ... 148
reading ./EC.csv ... 2547
reading ./EF.csv ... 1820
reading ./EP.csv ... 51
reading ./ETM.csv ... 133
reading ./ETN.csv ... 14
reading ./Foreign.csv ... 11690
reading ./Group.csv ... 3176
reading ./Hanja.csv ... 125750
reading ./IC.csv ... 1305
reading ./Inflect.csv ... 44820
reading ./J.csv ... 416
reading ./MAG.csv ... 14242
reading ./MAJ.csv ... 240
reading ./MM.csv ... 453
reading ./NNB.csv ... 140
reading ./NNBC.csv ... 677
reading ./NNG.csv ... 208524
reading ./NNP.csv ... 2371
reading ./NorthKorea.csv ... 3
reading ./NP.csv ... 342
reading ./NR.csv ... 482
reading ./Person-actor.csv ... 99230
reading ./Person.csv ... 196459
reading ./Place-address.csv ... 19301
reading ./Place-station.csv ... 1145
reading ./Place.csv ... 30303
reading ./Preanalysis.csv ... 5
reading ./Symbol.csv ... 16
reading ./user-nnp.csv ... 3
reading ./user-person.csv ... 3
reading ./user-place.csv ... 2
reading ./VA.csv ... 2360
reading ./VCN.csv ... 7
reading ./VCP.csv ... 9
reading ./VV.csv ... 7331
reading ./VX.csv ... 125
reading ./Wikipedia.csv ... 36762
reading ./XPN.csv ... 83
reading ./XR.csv ... 3637
reading ./XSA.csv ... 19
reading ./XSN.csv ... 124
reading ./XSV.csv ... 23
emitting double-array: 100% |###########################################| 
reading ./matrix.def ... 3822x2693
emitting matrix      : 100% |###########################################| 

done!
echo To enable dictionary, rewrite /usr/local/etc/mecabrc as \"dicdir = /usr/local/lib/mecab/dic/mecab-ko-dic\"
To enable dictionary, rewrite /usr/local/etc/mecabrc as "dicdir = /usr/local/lib/mecab/dic/mecab-ko-dic"
$ sudo make install
make[1]: Nothing to be done for `install-exec-am'.
 ./install-sh -c -d '/usr/local/lib/mecab/dic/mecab-ko-dic'
 /usr/bin/install -c -m 644 model.bin matrix.bin char.bin sys.dic unk.dic left-id.def right-id.def rewrite.def pos-id.def dicrc '/usr/local/lib/mecab/dic/mecab-ko-dic'

ユーザ辞書にコストを"1"とかで設定しても形態素解析結果に反映されない場合は、
とりあえず元々の辞書(※1)の定義を削除すると、うまく反映された。
(→この辺はコストなどの設定方法をよく理解していないので、とりあえずやった感が強い)

※1 元々の辞書

mecab-ko-dicフォルダの直下のCSVファイルの事。
CSVファイルは品詞ごとなどで細く分けられている。

$ ls *.csv
CoinedWord.csv      IC.csv          NNP.csv         Preanalysis.csv     XR.csv
EC.csv          Inflect.csv     NP.csv          Symbol.csv      XSA.csv
EF.csv          J.csv           NR.csv          VA.csv          XSN.csv
EP.csv          MAG.csv         NorthKorea.csv      VCN.csv         XSV.csv
ETM.csv         MAJ.csv         Person-actor.csv    VCP.csv         user-nnp.csv
ETN.csv         MM.csv          Person.csv      VV.csv          user-person.csv
Foreign.csv     NNB.csv         Place-address.csv   VX.csv          user-place.csv
Group.csv       NNBC.csv        Place-station.csv   Wikipedia.csv
Hanja.csv       NNG.csv         Place.csv       XPN.csv
2
0
6

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
0