データマイニング入門, 豊田 秀樹, 東京図書(2008/12/05)第9章潜在意味解析
第一の壁
MeCabがOSで導入していないと, Rのpackageだけでは駄目だった。
Raspberry PI
sudo apt-get install mecab
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
libmecab2 mecab-jumandic mecab-utils
The following NEW packages will be installed:
libmecab2 mecab mecab-jumandic mecab-utils
0 upgraded, 4 newly installed, 0 to remove and 53 not upgraded.
Need to get 6,359 kB of archives.
After this operation, 81.1 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://ftp.jaist.ac.jp/pub/Linux/raspbian-archive/raspbian stretch/main armhf libmecab2 armhf 0.996-3.1 [218 kB]
中略
reading /usr/share/mecab/dic/juman/unk.def ... 37
emitting double-array: 100% |###########################################|
/usr/share/mecab/dic/juman/model.def is not found. skipped.
reading /usr/share/mecab/dic/juman/Noun.hukusi.csv ... 74
reading /usr/share/mecab/dic/juman/Special.csv ... 124
reading /usr/share/mecab/dic/juman/Assert.csv ... 30
reading /usr/share/mecab/dic/juman/Noun.keishiki.csv ... 10
reading /usr/share/mecab/dic/juman/Suffix.csv ... 1163
reading /usr/share/mecab/dic/juman/ContentW.csv ... 483161
reading /usr/share/mecab/dic/juman/Demonstrative.csv ... 76
reading /usr/share/mecab/dic/juman/Prefix.csv ... 75
reading /usr/share/mecab/dic/juman/Noun.suusi.csv ... 46
reading /usr/share/mecab/dic/juman/Noun.koyuu.csv ... 29805
reading /usr/share/mecab/dic/juman/AuxV.csv ... 421
reading /usr/share/mecab/dic/juman/Postp.csv ... 104
reading /usr/share/mecab/dic/juman/Rengo.csv ... 913
emitting double-array: 100% |###########################################|
reading /usr/share/mecab/dic/juman/matrix.def ... 1509x1509
emitting matrix : 100% |###########################################|
done!
update-alternatives: using /var/lib/mecab/dic/juman to provide /var/lib/mecab/dic/debian (mecab-dictionary) in auto mode
Setting up mecab (0.996-3.1) ...
Compiling Juman dictionary for Mecab. This takes long time...
reading /usr/share/mecab/dic/juman/unk.def ... 37
emitting double-array: 100% |###########################################|
/usr/share/mecab/dic/juman/model.def is not found. skipped.
(中略)
emitting double-array: 100% |###########################################|
reading /usr/share/mecab/dic/juman/matrix.def ... 1509x1509
emitting matrix : 100% |###########################################|
done!
Windows
http://taku910.github.io/mecab/
Mecab とIPA DICをダウンロード。
Mecabを導入すると辞書も設定するとのこと。
Macintosh
brew install mecab
brew install mecab-ipadic
Windows, Macintoshとも無事乗り越えた。
第二の壁 パッケージ導入元の指定
RのRMeCab package導入にはURLの指定が必要。
install.packages("RMeCab", repos = "http://rmecab.jp/R")
第三の壁 Raspberry Piでのパッケージコンパイルエラー
install.packages("RMeCab", repos = "http://rmecab.jp/R")
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
Warning in install.packages("RMeCab", repos = "http://rmecab.jp/R") :
'lib = "/usr/local/lib/R/site-library"' is not writable
Would you like to use a personal library instead? (y/n) y
Would you like to create a personal library
~/R/arm-unknown-linux-gnueabihf-library/3.3
to install packages into? (y/n) y
trying URL 'http://rmecab.jp/R/src/contrib/RMeCab_0.99996.tar.gz'
Content type 'application/x-gzip' length 61092 bytes (59 KB)
==================================================
downloaded 59 KB
* installing *source* package ‘RMeCab’ ...
** libs
g++ -I/usr/share/R/include -DNDEBUG -I. -fpic -g -O2 -fdebug-prefix-map=/build/r-base-saFpct/r-base-3.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c Ngram.cpp -o Ngram.o
中略
g++ -I/usr/share/R/include -DNDEBUG -I. -fpic -g -O2 -fdebug-prefix-map=/build/r-base-saFpct/r-base-3.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c collocate.cpp -o collocate.o
collocate.cpp: In function ‘SEXPREC* collocate(SEXP, SEXP, SEXP, SEXP)’:
collocate.cpp:145:35: error: no matching function for call to ‘make_pair(char [BUF1], int)’
m0.insert(make_pair(buf1, 1));// 1 は 1個目と言う意味
^
In file included from /usr/include/c++/6/bits/stl_algobase.h:64:0,
from /usr/include/c++/6/bits/char_traits.h:39,
from /usr/include/c++/6/string:40,
from RMeCab.h:15,
from collocate.cpp:20:
/usr/include/c++/6/bits/stl_pair.h:497:5: note: candidate: template<class _T1, class _T2> constexpr std::pair<typename std::__decay_and_strip<_Tp>::__type, typename std::__decay_and_strip<_T2>::__type> std::make_pair(_T1&&, _T2&&)
make_pair(_T1&& __x, _T2&& __y)
^~~~~~~~~
/usr/include/c++/6/bits/stl_pair.h:497:5: note: template argument deduction/substitution failed:
collocate.cpp:145:35: note: variable-sized array type ‘char (&)[BUF1]’ is not a valid template argument
m0.insert(make_pair(buf1, 1));// 1 は 1個目と言う意味
^
/usr/lib/R/etc/Makeconf:141: recipe for target 'collocate.o' failed
make: *** [collocate.o] Error 1
ERROR: compilation failed for package ‘RMeCab’
* removing ‘/home/pi/R/arm-unknown-linux-gnueabihf-library/3.3/RMeCab’
The downloaded source packages are in
‘/tmp/RtmpgI3wMv/downloaded_packages’
Warning message:
In install.packages("RMeCab", repos = "http://rmecab.jp/R") :
installation of package ‘RMeCab’ had non-zero exit status
第四の壁
Mac OS Xでも、Windowsでも、IPA-DICは導入したはずなのに、
本には掲載のある「いる、くれる、する、なる、見守る、ある、考える、思う、果たす、入る、抜ける、富む」が拾えていない。
現象がWindows, Macintoshとも同じ。著者がどの辞書を使っているか問い合わせるとよいかも。
第五の壁
同じ命令で、同じエラー。本では
unmei.q <- myQuery(unmei2,rownames(rslt$tk))
4番目の俺がiters.listに存在しません
6番目の優先がiters.listに存在しません
7番目のしがiters.listに存在しません
10番目の面がiters.listに存在しません
Error in myQWuery(unmet.rownames(rst$tk))
Windows
unmei.q <- myQuery(unmei2,rownames(rslt$tk))
3 番目の cat(posNot[k], "\t", "番目の", notInList[k], "\t", "がterm.listに存在しません", でエラー:
引数 4 (タイプ 'list') は 'cat' で取り扱えません
Macintosh
unmei.q <- myQuery(unmei,rownames(rslt$tk))
3¥t番目の cat(posNot[k], "¥t", "番目の", notInList[k], "¥t", "がterm.listに存在しません", でエラー:
引数 4 (タイプ 'list') は 'cat' で取り扱えません
第六の壁 文字コードの壁 Raspberry PI & Macintosh
shift-jisからUTF-8に変換
docMatrix()関数もUTF-8じゃないと駄目
iconv -f Shift_JIS -t UTF8 doc01.txt > ../Mac/doc01.txt
...
参考文献/url
【R】【MeCab】RMeCabのインストールと形態素解析
https://qiita.com/hujuu/items/314a64a50875cdabf755
R(データマイニング入門) Raspbian(Raspberry PI),Mac OSX, docker/ubuntu 6段階。データ取得・導入・起動・実行・描画・一括実行。
https://qiita.com/kaizen_nagoya/items/e8417310129c2425af59
R(データマイニング入門) Windows 6段階。データ取得・導入・起動・スクリプト読み込み・一括実行・逐次実行。3つの罠。
https://qiita.com/kaizen_nagoya/items/52a288b002a54cd613d3
最後までおよみいただきありがとうございました。
いいね 💚、フォローをお願いします。
Thank you very much for reading to the last sentence.
Please press the like icon 💚 and follow me for your happy life.