概要
- Mac OS X 10.14.4 Mojaveに以下をインストールする
- MeCab
- mecab-ipadic
- mecab-ipadic-NEologd
- mecab-python3
- mecab-python3は最新バージョンのCommand Line Toolsでは入らないというMojaveの闇があるらしいので、その場合はバージョン9.4にダウングレードする
- https://analytics-note.xyz/mac/mojave-mecab-python3-error/
- 筆者はなぜか最新バージョンでインストールできた
環境
- Mac OS X 10.14.4 Mojave
- Command Line Tools (CLT): 10.2.1.0.1.1554506761
- oh-my-zsh使用
- Homebrew 2.1.6 (git revision 0d363)
- Python 3.7.2
- anaconda3-5.3.1
- pip 19.0.3
手順
MeCabのインストール
- mecabとmecab辞書をbrewでインストール
$ brew install mecab mecab-ipadic
...
==> Downloading https://homebrew.bintray.com/bottles/mecab-0.996.mojave.bottle.3.tar.gz
==> Downloading from https://akamai.bintray.com/ef/ef261d203140305ca8c9e4b7311c61176a17325df9454610d3eb33a312c4d3c5?__gda__=exp=1561214658~hmac=88522b7f1
######################################################################## 100.0%
==> Pouring mecab-0.996.mojave.bottle.3.tar.gz
🍺 /usr/local/Cellar/mecab/0.996: 20 files, 4.2MB . )
==> `brew cleanup` has not been run in 30 days, running now...
...
==> Downloading https://homebrew.bintray.com/bottles/mecab-ipadic-2.7.0-20070801.mojave.bottle.tar.gz
==> Downloading from https://akamai.bintray.com/30/30967b4167d34f05c79f185d71a40198fff4067d0cce82aed59383548c898681?__gda__=exp=1561214679~hmac=c478ea83e
######################################################################## 100.0%
==> Pouring mecab-ipadic-2.7.0-20070801.mojave.bottle.tar.gz
==> Caveats
To enable mecab-ipadic dictionary, add to /usr/local/etc/mecabrc:
dicdir = /usr/local/lib/mecab/dic/ipadic
==> Summary
🍺 /usr/local/Cellar/mecab-ipadic/2.7.0-20070801: 16 files, 50.6MB
==> Caveats
==> mecab-ipadic
To enable mecab-ipadic dictionary, add to /usr/local/etc/mecabrc:
dicdir = /usr/local/lib/mecab/dic/ipadic
- mecabを試してみる
- EOSはEnd Of Sentence、文の終わり
$ mecab
すももももももももものうち
すもも 名詞,一般,*,*,*,*,すもも,スモモ,スモモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも 名詞,一般,*,*,*,*,もも,モモ,モモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも 名詞,一般,*,*,*,*,もも,モモ,モモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
の 助詞,連体化,*,*,*,*,の,ノ,ノ
うち 名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ
EOS
ティックトック
ティックトック 名詞,固有名詞,組織,*,*,*,*
EOS
安倍晋三内閣総理大臣
安倍 名詞,固有名詞,人名,姓,*,*,安倍,アベ,アベ
晋 名詞,固有名詞,人名,名,*,*,晋,ススム,ススム
三 名詞,数,*,*,*,*,三,サン,サン
内閣 名詞,一般,*,*,*,*,内閣,ナイカク,ナイカク
総理 名詞,一般,*,*,*,*,総理,ソウリ,ソーリ
大臣 名詞,一般,*,*,*,*,大臣,ダイジン,ダイジン
EOS
まじ卍
まじ 名詞,形容動詞語幹,*,*,*,*,まじ,マジ,マジ
卍 名詞,一般,*,*,*,*,卍,マンジ,マンジ
EOS
あげみざわ
あげ 動詞,自立,*,*,一段,連用形,あげる,アゲ,アゲ
み 動詞,非自立,*,*,一段,連用形,みる,ミ,ミ
ざわ 名詞,固有名詞,組織,*,*,*,*
EOS
NEologd辞書をインストール
- gitでmecab-ipadic-NEologdをclone
$ git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git
Cloning into 'mecab-ipadic-neologd'...
remote: Enumerating objects: 75, done.
remote: Counting objects: 100% (75/75), done.
remote: Compressing objects: 100% (74/74), done.
Unpacking objects: 84% (63/75)
remote: Total 75 (delta 5), reused 54 (delta 0), pack-reused 0
Unpacking objects: 100% (75/75), done.
- mecab-ipadic-NEologdをビルド
-
./bin/install-mecab-ipadic-neologd -n
だと一部の辞書しかインストールされない -
./bin/install-mecab-ipadic-neologd -n -a
だと辞書が全部入りになる
-
$ cd mecab-ipadic-neologd
$ ./bin/install-mecab-ipadic-neologd -n
[install-mecab-ipadic-NEologd] : Start..
[install-mecab-ipadic-NEologd] : Check the existance of libraries
...
[install-mecab-ipadic-NEologd] : mecab-ipadic-NEologd is already up-to-date
[install-mecab-ipadic-NEologd] : mecab-ipadic-NEologd will be install to /usr/local/lib/mecab/dic/mecab-ipadic-neologd
[install-mecab-ipadic-NEologd] : Make mecab-ipadic-NEologd
[make-mecab-ipadic-NEologd] : Start..
[make-mecab-ipadic-NEologd] : Check local seed directory
[make-mecab-ipadic-NEologd] : Check local seed file
[make-mecab-ipadic-NEologd] : Check local build directory
[make-mecab-ipadic-NEologd] : create /Users/hoge/fuga/piyo/mecab-ipadic-neologd/libexec/../build
[make-mecab-ipadic-NEologd] : Download original mecab-ipadic file
[make-mecab-ipadic-NEologd] : Try to access to https://ja.osdn.net
[make-mecab-ipadic-NEologd] : Try to download from https://ja.osdn.net/frs/g_redir.php?m=kent&f=mecab%2Fmecab-ipadic%2F2.7.0-20070801%2Fmecab-ipadic-2.7.
0-20070801.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 11.6M 100 11.6M 0 0 119k 0 0:01:40 0:01:40 --:--:-- 389k
Hash value of /Users/hoge/fuga/piyo/mecab-ipadic-neologd/libexec/../build/mecab-ipadic-2.7.0-20070801.tar.gz matched
[make-mecab-ipadic-NEologd] : Decompress original mecab-ipadic file
x mecab-ipadic-2.7.0-20070801/
...
[make-mecab-ipadic-NEologd] : Configure custom system dictionary on /Users/hoge/fuga/piyo/mecab-ipadic-neologd/libexec/../build/mecab-ipadic-2.7.0
-20070801-neologd-20190617
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking whether make sets $(MAKE)... yes
checking for working aclocal-1.4... missing
checking for working autoconf... found
checking for working automake-1.4... missing
checking for working autoheader... found
checking for working makeinfo... found
checking for a BSD-compatible install... /usr/bin/install -c
checking for mecab-config... /usr/local/bin/mecab-config
configure: creating ./config.status
config.status: creating Makefile
[make-mecab-ipadic-NEologd] : Encode the character encoding of system dictionary resources from EUC_JP to UTF-8
./../../libexec/iconv_euc_to_utf8.sh ./Noun.place.csv
...
[make-mecab-ipadic-NEologd] : Fix yomigana field of IPA dictionary
patching file Noun.csv
...
[make-mecab-ipadic-NEologd] : Copy user dictionary resource
...
[make-mecab-ipadic-NEologd] : Make custom system dictionary on /Users/hoge/fuga/piyo/mecab-ipadic-neologd/libexec/../build/mecab-ipadic-2.7.0-2007
0801-neologd-20190617
make: Nothing to be done for `all'.
[make-mecab-ipadic-NEologd] : Finish..
[install-mecab-ipadic-NEologd] : Get results of tokenize test
[test-mecab-ipadic-NEologd] : Start..
[test-mecab-ipadic-NEologd] : Replace timestamp from 'git clone' date to 'git commit' date
[test-mecab-ipadic-NEologd] : Get buzz phrases
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1324 100 1324 0 0 1524 0 --:--:-- --:--:-- --:--:-- 1523
[test-mecab-ipadic-NEologd] : Get difference between default system dictionary and mecab-ipadic-NEologd
[test-mecab-ipadic-NEologd] : Tokenize phrase using default system dictionary
[test-mecab-ipadic-NEologd] : Tokenize phrase using mecab-ipadic-NEologd
[test-mecab-ipadic-NEologd] : Get result of diff
[test-mecab-ipadic-NEologd] : Please check difference between default system dictionary and mecab-ipadic-NEologd
default system dictionary | mecab-ipadic-NEologd
ジャニー さん | ジャニーさん
山本 彩 | 山本彩
聖 お にいさん | 聖おにいさん
ま ふま ふ | まふまふ
ガ スリー | ガスリー
[test-mecab-ipadic-NEologd] : Finish..
[install-mecab-ipadic-NEologd] : Please check the list of differences in the upper part.
[install-mecab-ipadic-NEologd] : Do you want to install mecab-ipadic-NEologd? Type yes or no.
- ここで
yes
と回答
yes
[install-mecab-ipadic-NEologd] : OK. Let's install mecab-ipadic-NEologd.
[install-mecab-ipadic-NEologd] : Start..
[install-mecab-ipadic-NEologd] : /usr/local/lib/mecab/dic is current user's directory
[install-mecab-ipadic-NEologd] : Make install to /usr/local/lib/mecab/dic/mecab-ipadic-neologd
make[1]: Nothing to be done for `install-exec-am'.
/bin/sh ./mkinstalldirs /usr/local/lib/mecab/dic/mecab-ipadic-neologd
mkdir /usr/local/lib/mecab/dic/mecab-ipadic-neologd
/usr/bin/install -c -m 644 ./matrix.bin /usr/local/lib/mecab/dic/mecab-ipadic-neologd/matrix.bin
/usr/bin/install -c -m 644 ./char.bin /usr/local/lib/mecab/dic/mecab-ipadic-neologd/char.bin
/usr/bin/install -c -m 644 ./sys.dic /usr/local/lib/mecab/dic/mecab-ipadic-neologd/sys.dic
/usr/bin/install -c -m 644 ./unk.dic /usr/local/lib/mecab/dic/mecab-ipadic-neologd/unk.dic
/usr/bin/install -c -m 644 ./left-id.def /usr/local/lib/mecab/dic/mecab-ipadic-neologd/left-id.def
/usr/bin/install -c -m 644 ./right-id.def /usr/local/lib/mecab/dic/mecab-ipadic-neologd/right-id.def
/usr/bin/install -c -m 644 ./rewrite.def /usr/local/lib/mecab/dic/mecab-ipadic-neologd/rewrite.def
/usr/bin/install -c -m 644 ./pos-id.def /usr/local/lib/mecab/dic/mecab-ipadic-neologd/pos-id.def
/usr/bin/install -c -m 644 ./dicrc /usr/local/lib/mecab/dic/mecab-ipadic-neologd/dicrc
[install-mecab-ipadic-NEologd] : Install completed.
[install-mecab-ipadic-NEologd] : When you use MeCab, you can set '/usr/local/lib/mecab/dic/mecab-ipadic-neologd' as a value of '-d' option of MeCab.
[install-mecab-ipadic-NEologd] : Usage of mecab-ipadic-NEologd is here.
Usage:
$ mecab -d /usr/local/lib/mecab/dic/mecab-ipadic-neologd ...
[install-mecab-ipadic-NEologd] : Finish..
[install-mecab-ipadic-NEologd] : Finish..
- やっぱ辞書は全部入りにする
$ ./bin/install-mecab-ipadic-neologd -n -a
- 形態素解析に使用するシステム辞書をMeCab標準のmecab-ipadicからmecab-ipadic-NEologdに設定変更
- エディタはなんでもいい
$ sudo subl /usr/local/etc/mecabrc
$ #sudo nano /usr/local/etc/mecabrc
- 以下のように変更(変更箇所は6-7行目の
dicdir
)
;
; Configuration file of MeCab
;
; $Id: mecabrc.in,v 1.3 2006/05/29 15:36:08 taku-ku Exp $;
;
dicdir = /usr/local/lib/mecab/dic/mecab-ipadic-neologd
;dicdir = /usr/local/lib/mecab/dic/ipadic
; userdic = /home/foo/bar/user.dic
; output-format-type = wakati
; input-buffer-size = 8192
; node-format = %m\n
; bos-format = %S\n
; eos-format = EOS\n
- 保存して閉じる
- 再度MeCabを試す
$ mecab
すももももももももものうち
すもももももも 名詞,固有名詞,一般,*,*,*,すもももももも,スモモモモモモ,スモモモモモモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも 名詞,一般,*,*,*,*,もも,モモ,モモ
の 助詞,連体化,*,*,*,*,の,ノ,ノ
うち 名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ
EOS
ティックトック
ティックトック 名詞,固有名詞,一般,*,*,*,Tik Tok,ティックトック,ティックトック
EOS
安倍晋三内閣総理大臣
安倍晋三 名詞,固有名詞,一般,*,*,*,安倍晋三,アベシンゾウ,アベシンゾー
内閣総理大臣 名詞,固有名詞,一般,*,*,*,内閣総理大臣,ナイカクソウリダイジン,ナイカクソーリダイジン
EOS
まじ卍
まじ卍 名詞,固有名詞,一般,*,*,*,まじ卍,マジマンジ,マジマンジ
EOS
あげみざわ
あげみざわ 名詞,固有名詞,一般,*,*,*,あげみざわ,アゲミザワ,アゲミザワ
EOS
- 結果が変わったことを確認卍
mecab-python3をインストールしてpythonからmecabを使えるようにする
- pipでmecab-python3をインストール
$ pip install mecab-python3
Collecting mecab-python3
Downloading https://files.pythonhosted.org/packages/4a/c0/ffbfaf1b4721117e12bc169bc46d49a6c37143ce94388c80b256cd405f00/mecab_python3-0.996.2-cp37-cp37m-macosx_10_6_intel.whl (14.1MB)
100% |████████████████████████████████| 14.1MB 223kB/s
Installing collected packages: mecab-python3
Successfully installed mecab-python3-0.996.2
- pythonでMeCabを使ってみる
import better_exceptions
import colored_traceback.always
import MeCab
text = "ティックトックでどんだけ食べてもゼロカロリーやってみた\(^o^)/卍。"
pos_list = [10, 11, 31, 32, 34]
pos_list.extend(list(range(36,50)))
pos_list.extend([59, 60, 62, 67])
stop_words = ["する", "ない", "なる", "もう", "しよ", "でき", "なっ", "くっ", "やっ", "ある", "しれ", "思う", "今日", "それ", "これ", "あれ", "どれ", "どの", "NULL", "れる", "なり", "あっ"]
def create_mecab_list(text_list):
mecab_list = []
mecab = MeCab.Tagger("-d /usr/local/lib/mecab/dic/mecab-ipadic-neologd")
mecab.parse("")
# encoding = text.encode('utf-8')
for text in text_list:
node = mecab.parseToNode(text)
while node:
# [品詞,品詞細分類1,品詞細分類2,品詞細分類3,活用形,活用型,原形,読み,発音]
# 忙しく 形容詞,自立,*,*,形容詞・イ段,連用テ接続,忙しい,イソガシク,イソガシク
# morpheme = node.surface
morpheme = " : ".join([node.surface, node.feature.split(",")[6], node.feature.split(",")[7]])
if morpheme in stop_words:
node = node.next
continue
if len(morpheme) > 0: # > 1:
if node.posid in pos_list:
mecab_list.append(morpheme)
# print(morpheme, end=", ")
node = node.next
return mecab_list
mecab_list = create_mecab_list([text])
for w in mecab_list:
print(w)
- 結果を確認
ティックトック : Tik Tok : ティックトック
どんだけ : どんだけ~ : ドンダケ
食べ : 食べる : タベ
ゼロ : ゼロ : ゼロ
やっ : やる : ヤッ
\(^o^)/ : \/ : バンザイ
卍 : 卍 : マンジ