40
24

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

Mac (Mojave)にMeCab、mecab-python3とNeologdをインストールして使えるように設定する

Last updated at Posted at 2019-06-22

概要

  • Mac OS X 10.14.4 Mojaveに以下をインストールする
  • mecab-python3は最新バージョンのCommand Line Toolsでは入らないというMojaveの闇があるらしいので、その場合はバージョン9.4にダウングレードする

環境


手順

MeCabのインストール

  • mecabとmecab辞書をbrewでインストール
$ brew install mecab mecab-ipadic
...

==> Downloading https://homebrew.bintray.com/bottles/mecab-0.996.mojave.bottle.3.tar.gz
==> Downloading from https://akamai.bintray.com/ef/ef261d203140305ca8c9e4b7311c61176a17325df9454610d3eb33a312c4d3c5?__gda__=exp=1561214658~hmac=88522b7f1
######################################################################## 100.0%
==> Pouring mecab-0.996.mojave.bottle.3.tar.gz
🍺  /usr/local/Cellar/mecab/0.996: 20 files, 4.2MB        .     )
==> `brew cleanup` has not been run in 30 days, running now...
...

==> Downloading https://homebrew.bintray.com/bottles/mecab-ipadic-2.7.0-20070801.mojave.bottle.tar.gz
==> Downloading from https://akamai.bintray.com/30/30967b4167d34f05c79f185d71a40198fff4067d0cce82aed59383548c898681?__gda__=exp=1561214679~hmac=c478ea83e
######################################################################## 100.0%
==> Pouring mecab-ipadic-2.7.0-20070801.mojave.bottle.tar.gz
==> Caveats
To enable mecab-ipadic dictionary, add to /usr/local/etc/mecabrc:
  dicdir = /usr/local/lib/mecab/dic/ipadic
==> Summary
🍺  /usr/local/Cellar/mecab-ipadic/2.7.0-20070801: 16 files, 50.6MB
==> Caveats
==> mecab-ipadic
To enable mecab-ipadic dictionary, add to /usr/local/etc/mecabrc:
  dicdir = /usr/local/lib/mecab/dic/ipadic
  • mecabを試してみる
    • EOSはEnd Of Sentence、文の終わり
$ mecab
すももももももももものうち
すもも  名詞,一般,*,*,*,*,すもも,スモモ,スモモ
も      助詞,係助詞,*,*,*,*,も,モ,モ
もも    名詞,一般,*,*,*,*,もも,モモ,モモ
も      助詞,係助詞,*,*,*,*,も,モ,モ
もも    名詞,一般,*,*,*,*,もも,モモ,モモ
も      助詞,係助詞,*,*,*,*,も,モ,モ
の      助詞,連体化,*,*,*,*,の,ノ,ノ
うち    名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ
EOS
ティックトック
ティックトック  名詞,固有名詞,組織,*,*,*,*
EOS
安倍晋三内閣総理大臣
安倍    名詞,固有名詞,人名,姓,*,*,安倍,アベ,アベ
晋      名詞,固有名詞,人名,名,*,*,晋,ススム,ススム
三      名詞,数,*,*,*,*,三,サン,サン
内閣    名詞,一般,*,*,*,*,内閣,ナイカク,ナイカク
総理    名詞,一般,*,*,*,*,総理,ソウリ,ソーリ
大臣    名詞,一般,*,*,*,*,大臣,ダイジン,ダイジン
EOS
まじ卍
まじ    名詞,形容動詞語幹,*,*,*,*,まじ,マジ,マジ
卍      名詞,一般,*,*,*,*,卍,マンジ,マンジ
EOS
あげみざわ
あげ    動詞,自立,*,*,一段,連用形,あげる,アゲ,アゲ
み      動詞,非自立,*,*,一段,連用形,みる,ミ,ミ
ざわ    名詞,固有名詞,組織,*,*,*,*
EOS

NEologd辞書をインストール

  • gitでmecab-ipadic-NEologdをclone
$ git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git
Cloning into 'mecab-ipadic-neologd'...
remote: Enumerating objects: 75, done.
remote: Counting objects: 100% (75/75), done.
remote: Compressing objects: 100% (74/74), done.
Unpacking objects:  84% (63/75)
remote: Total 75 (delta 5), reused 54 (delta 0), pack-reused 0
Unpacking objects: 100% (75/75), done.
  • mecab-ipadic-NEologdをビルド
    • ./bin/install-mecab-ipadic-neologd -nだと一部の辞書しかインストールされない
    • ./bin/install-mecab-ipadic-neologd -n -aだと辞書が全部入りになる
$ cd mecab-ipadic-neologd
$ ./bin/install-mecab-ipadic-neologd -n
[install-mecab-ipadic-NEologd] : Start..
[install-mecab-ipadic-NEologd] : Check the existance of libraries
...

[install-mecab-ipadic-NEologd] : mecab-ipadic-NEologd is already up-to-date

[install-mecab-ipadic-NEologd] : mecab-ipadic-NEologd will be install to /usr/local/lib/mecab/dic/mecab-ipadic-neologd

[install-mecab-ipadic-NEologd] : Make mecab-ipadic-NEologd
[make-mecab-ipadic-NEologd] : Start..
[make-mecab-ipadic-NEologd] : Check local seed directory
[make-mecab-ipadic-NEologd] : Check local seed file
[make-mecab-ipadic-NEologd] : Check local build directory
[make-mecab-ipadic-NEologd] : create /Users/hoge/fuga/piyo/mecab-ipadic-neologd/libexec/../build
[make-mecab-ipadic-NEologd] : Download original mecab-ipadic file
[make-mecab-ipadic-NEologd] : Try to access to https://ja.osdn.net
[make-mecab-ipadic-NEologd] : Try to download from https://ja.osdn.net/frs/g_redir.php?m=kent&f=mecab%2Fmecab-ipadic%2F2.7.0-20070801%2Fmecab-ipadic-2.7.
0-20070801.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 11.6M  100 11.6M    0     0   119k      0  0:01:40  0:01:40 --:--:--  389k
Hash value of /Users/hoge/fuga/piyo/mecab-ipadic-neologd/libexec/../build/mecab-ipadic-2.7.0-20070801.tar.gz matched
[make-mecab-ipadic-NEologd] : Decompress original mecab-ipadic file
x mecab-ipadic-2.7.0-20070801/
...

[make-mecab-ipadic-NEologd] : Configure custom system dictionary on /Users/hoge/fuga/piyo/mecab-ipadic-neologd/libexec/../build/mecab-ipadic-2.7.0
-20070801-neologd-20190617
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking whether make sets $(MAKE)... yes
checking for working aclocal-1.4... missing
checking for working autoconf... found
checking for working automake-1.4... missing
checking for working autoheader... found
checking for working makeinfo... found
checking for a BSD-compatible install... /usr/bin/install -c
checking for mecab-config... /usr/local/bin/mecab-config
configure: creating ./config.status
config.status: creating Makefile
[make-mecab-ipadic-NEologd] : Encode the character encoding of system dictionary resources from EUC_JP to UTF-8
./../../libexec/iconv_euc_to_utf8.sh ./Noun.place.csv
...

[make-mecab-ipadic-NEologd] : Fix yomigana field of IPA dictionary
patching file Noun.csv
...

[make-mecab-ipadic-NEologd] : Copy user dictionary resource
...

[make-mecab-ipadic-NEologd] : Make custom system dictionary on /Users/hoge/fuga/piyo/mecab-ipadic-neologd/libexec/../build/mecab-ipadic-2.7.0-2007
0801-neologd-20190617
make: Nothing to be done for `all'.
[make-mecab-ipadic-NEologd] : Finish..
[install-mecab-ipadic-NEologd] : Get results of tokenize test
[test-mecab-ipadic-NEologd] : Start..
[test-mecab-ipadic-NEologd] : Replace timestamp from 'git clone' date to 'git commit' date
[test-mecab-ipadic-NEologd] : Get buzz phrases
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1324  100  1324    0     0   1524      0 --:--:-- --:--:-- --:--:--  1523
[test-mecab-ipadic-NEologd] : Get difference between default system dictionary and mecab-ipadic-NEologd
[test-mecab-ipadic-NEologd] : Tokenize phrase using default system dictionary
[test-mecab-ipadic-NEologd] : Tokenize phrase using mecab-ipadic-NEologd
[test-mecab-ipadic-NEologd] : Get result of diff
[test-mecab-ipadic-NEologd] : Please check difference between default system dictionary and mecab-ipadic-NEologd

default system dictionary         |     mecab-ipadic-NEologd
ジャニー さん                     |     ジャニーさん
山本 彩                           |     山本彩
聖 お にいさん                    |     聖おにいさん
ま ふま ふ                        |     まふまふ
ガ スリー                         |     ガスリー

[test-mecab-ipadic-NEologd] : Finish..

[install-mecab-ipadic-NEologd] : Please check the list of differences in the upper part.

[install-mecab-ipadic-NEologd] : Do you want to install mecab-ipadic-NEologd? Type yes or no.
  • ここでyesと回答
yes
[install-mecab-ipadic-NEologd] : OK. Let's install mecab-ipadic-NEologd.
[install-mecab-ipadic-NEologd] : Start..
[install-mecab-ipadic-NEologd] : /usr/local/lib/mecab/dic is current user's directory
[install-mecab-ipadic-NEologd] : Make install to /usr/local/lib/mecab/dic/mecab-ipadic-neologd
make[1]: Nothing to be done for `install-exec-am'.
/bin/sh ./mkinstalldirs /usr/local/lib/mecab/dic/mecab-ipadic-neologd
mkdir /usr/local/lib/mecab/dic/mecab-ipadic-neologd
 /usr/bin/install -c -m 644 ./matrix.bin /usr/local/lib/mecab/dic/mecab-ipadic-neologd/matrix.bin
 /usr/bin/install -c -m 644 ./char.bin /usr/local/lib/mecab/dic/mecab-ipadic-neologd/char.bin
 /usr/bin/install -c -m 644 ./sys.dic /usr/local/lib/mecab/dic/mecab-ipadic-neologd/sys.dic
 /usr/bin/install -c -m 644 ./unk.dic /usr/local/lib/mecab/dic/mecab-ipadic-neologd/unk.dic
 /usr/bin/install -c -m 644 ./left-id.def /usr/local/lib/mecab/dic/mecab-ipadic-neologd/left-id.def
 /usr/bin/install -c -m 644 ./right-id.def /usr/local/lib/mecab/dic/mecab-ipadic-neologd/right-id.def
 /usr/bin/install -c -m 644 ./rewrite.def /usr/local/lib/mecab/dic/mecab-ipadic-neologd/rewrite.def
 /usr/bin/install -c -m 644 ./pos-id.def /usr/local/lib/mecab/dic/mecab-ipadic-neologd/pos-id.def
 /usr/bin/install -c -m 644 ./dicrc /usr/local/lib/mecab/dic/mecab-ipadic-neologd/dicrc

[install-mecab-ipadic-NEologd] : Install completed.
[install-mecab-ipadic-NEologd] : When you use MeCab, you can set '/usr/local/lib/mecab/dic/mecab-ipadic-neologd' as a value of '-d' option of MeCab.
[install-mecab-ipadic-NEologd] : Usage of mecab-ipadic-NEologd is here.
Usage:
    $ mecab -d /usr/local/lib/mecab/dic/mecab-ipadic-neologd ...

[install-mecab-ipadic-NEologd] : Finish..
[install-mecab-ipadic-NEologd] : Finish..
  • やっぱ辞書は全部入りにする
$ ./bin/install-mecab-ipadic-neologd -n -a
  • 形態素解析に使用するシステム辞書をMeCab標準のmecab-ipadicからmecab-ipadic-NEologdに設定変更
    • エディタはなんでもいい
$ sudo subl /usr/local/etc/mecabrc
$ #sudo nano /usr/local/etc/mecabrc
  • 以下のように変更(変更箇所は6-7行目のdicdir
;
; Configuration file of MeCab
;
; $Id: mecabrc.in,v 1.3 2006/05/29 15:36:08 taku-ku Exp $;
;
dicdir =  /usr/local/lib/mecab/dic/mecab-ipadic-neologd
;dicdir =  /usr/local/lib/mecab/dic/ipadic

; userdic = /home/foo/bar/user.dic

; output-format-type = wakati
; input-buffer-size = 8192

; node-format = %m\n
; bos-format = %S\n
; eos-format = EOS\n
  • 保存して閉じる
  • 再度MeCabを試す
$ mecab
すももももももももものうち
すもももももも  名詞,固有名詞,一般,*,*,*,すもももももも,スモモモモモモ,スモモモモモモ
も      助詞,係助詞,*,*,*,*,も,モ,モ
もも    名詞,一般,*,*,*,*,もも,モモ,モモ
の      助詞,連体化,*,*,*,*,の,ノ,ノ
うち    名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ
EOS
ティックトック
ティックトック  名詞,固有名詞,一般,*,*,*,Tik Tok,ティックトック,ティックトック
EOS
安倍晋三内閣総理大臣
安倍晋三        名詞,固有名詞,一般,*,*,*,安倍晋三,アベシンゾウ,アベシンゾー
内閣総理大臣    名詞,固有名詞,一般,*,*,*,内閣総理大臣,ナイカクソウリダイジン,ナイカクソーリダイジン
EOS
まじ卍
まじ卍  名詞,固有名詞,一般,*,*,*,まじ卍,マジマンジ,マジマンジ
EOS
あげみざわ
あげみざわ      名詞,固有名詞,一般,*,*,*,あげみざわ,アゲミザワ,アゲミザワ
EOS
  • 結果が変わったことを確認卍

mecab-python3をインストールしてpythonからmecabを使えるようにする

  • pipでmecab-python3をインストール
$ pip install mecab-python3
Collecting mecab-python3
  Downloading https://files.pythonhosted.org/packages/4a/c0/ffbfaf1b4721117e12bc169bc46d49a6c37143ce94388c80b256cd405f00/mecab_python3-0.996.2-cp37-cp37m-macosx_10_6_intel.whl (14.1MB)
    100% |████████████████████████████████| 14.1MB 223kB/s
Installing collected packages: mecab-python3
Successfully installed mecab-python3-0.996.2
  • pythonでMeCabを使ってみる
import better_exceptions
import colored_traceback.always

import MeCab

text = "ティックトックでどんだけ食べてもゼロカロリーやってみた\(^o^)/卍。"

pos_list = [10, 11, 31, 32, 34]
pos_list.extend(list(range(36,50)))
pos_list.extend([59, 60, 62, 67])
stop_words = ["する", "ない", "なる", "もう", "しよ", "でき", "なっ", "くっ", "やっ", "ある", "しれ", "思う", "今日", "それ", "これ", "あれ", "どれ", "どの", "NULL", "れる", "なり", "あっ"]

def create_mecab_list(text_list):
	mecab_list = []
	mecab = MeCab.Tagger("-d /usr/local/lib/mecab/dic/mecab-ipadic-neologd")
	mecab.parse("")
	# encoding = text.encode('utf-8')
	for text in text_list:
		node = mecab.parseToNode(text)
		while node:
			# [品詞,品詞細分類1,品詞細分類2,品詞細分類3,活用形,活用型,原形,読み,発音]
			# 忙しく  形容詞,自立,*,*,形容詞・イ段,連用テ接続,忙しい,イソガシク,イソガシク
			# morpheme = node.surface
			morpheme = " : ".join([node.surface, node.feature.split(",")[6], node.feature.split(",")[7]])
			if morpheme in stop_words:
				node = node.next
				continue
			if len(morpheme) > 0: # > 1:
				if node.posid in pos_list:
					mecab_list.append(morpheme)
					# print(morpheme, end=", ")
			node = node.next
	return mecab_list

mecab_list = create_mecab_list([text])

for w in mecab_list:
	print(w)
  • 結果を確認
ティックトック : Tik Tok : ティックトック
どんだけ : どんだけ~ : ドンダケ
食べ : 食べる : タベ
ゼロ : ゼロ : ゼロ
やっ : やる : ヤッ
\(^o^)/ : \/ : バンザイ
卍 : 卍 : マンジ
40
24
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
40
24

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?