LoginSignup
1
0
この記事誰得? 私しか得しないニッチな技術で記事投稿!

データマイニング入門 第9章「潜在意味解析」が本のように動かない件。壁六つ。コードの壁も。

Last updated at Posted at 2018-01-24

データマイニング入門, 豊田 秀樹, 東京図書(2008/12/05)第9章潜在意味解析

第一の壁

MeCabがOSで導入していないと, Rのpackageだけでは駄目だった。

Raspberry PI

sudo apt-get install mecab
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  libmecab2 mecab-jumandic mecab-utils
The following NEW packages will be installed:
  libmecab2 mecab mecab-jumandic mecab-utils
0 upgraded, 4 newly installed, 0 to remove and 53 not upgraded.
Need to get 6,359 kB of archives.
After this operation, 81.1 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://ftp.jaist.ac.jp/pub/Linux/raspbian-archive/raspbian stretch/main armhf libmecab2 armhf 0.996-3.1 [218 kB]
中略
reading /usr/share/mecab/dic/juman/unk.def ... 37
emitting double-array: 100% |###########################################| 
/usr/share/mecab/dic/juman/model.def is not found. skipped.
reading /usr/share/mecab/dic/juman/Noun.hukusi.csv ... 74
reading /usr/share/mecab/dic/juman/Special.csv ... 124
reading /usr/share/mecab/dic/juman/Assert.csv ... 30
reading /usr/share/mecab/dic/juman/Noun.keishiki.csv ... 10
reading /usr/share/mecab/dic/juman/Suffix.csv ... 1163
reading /usr/share/mecab/dic/juman/ContentW.csv ... 483161
reading /usr/share/mecab/dic/juman/Demonstrative.csv ... 76
reading /usr/share/mecab/dic/juman/Prefix.csv ... 75
reading /usr/share/mecab/dic/juman/Noun.suusi.csv ... 46
reading /usr/share/mecab/dic/juman/Noun.koyuu.csv ... 29805
reading /usr/share/mecab/dic/juman/AuxV.csv ... 421
reading /usr/share/mecab/dic/juman/Postp.csv ... 104
reading /usr/share/mecab/dic/juman/Rengo.csv ... 913
emitting double-array: 100% |###########################################| 
reading /usr/share/mecab/dic/juman/matrix.def ... 1509x1509
emitting matrix      : 100% |###########################################| 
done!
update-alternatives: using /var/lib/mecab/dic/juman to provide /var/lib/mecab/dic/debian (mecab-dictionary) in auto mode
Setting up mecab (0.996-3.1) ...
Compiling Juman dictionary for Mecab.  This takes long time...
reading /usr/share/mecab/dic/juman/unk.def ... 37
emitting double-array: 100% |###########################################| 
/usr/share/mecab/dic/juman/model.def is not found. skipped.
(中略)
emitting double-array: 100% |###########################################| 
reading /usr/share/mecab/dic/juman/matrix.def ... 1509x1509
emitting matrix      : 100% |###########################################| 
done!

Windows

http://taku910.github.io/mecab/
Mecab とIPA DICをダウンロード。
Mecabを導入すると辞書も設定するとのこと。

Macintosh

brew install mecab
brew install mecab-ipadic

Windows, Macintoshとも無事乗り越えた。

第二の壁 パッケージ導入元の指定

RのRMeCab package導入にはURLの指定が必要。

install.packages("RMeCab", repos = "http://rmecab.jp/R")

第三の壁 Raspberry Piでのパッケージコンパイルエラー

install.packages("RMeCab", repos = "http://rmecab.jp/R")
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
Warning in install.packages("RMeCab", repos = "http://rmecab.jp/R") :
  'lib = "/usr/local/lib/R/site-library"' is not writable
Would you like to use a personal library instead?  (y/n) y
Would you like to create a personal library
~/R/arm-unknown-linux-gnueabihf-library/3.3
to install packages into?  (y/n) y
trying URL 'http://rmecab.jp/R/src/contrib/RMeCab_0.99996.tar.gz'
Content type 'application/x-gzip' length 61092 bytes (59 KB)
==================================================
downloaded 59 KB

* installing *source* package ‘RMeCab’ ...
** libs
g++ -I/usr/share/R/include -DNDEBUG -I.     -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-saFpct/r-base-3.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Ngram.cpp -o Ngram.o
中略
g++ -I/usr/share/R/include -DNDEBUG -I.     -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-saFpct/r-base-3.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c collocate.cpp -o collocate.o
collocate.cpp: In function ‘SEXPREC* collocate(SEXP, SEXP, SEXP, SEXP)’:
collocate.cpp:145:35: error: no matching function for call to ‘make_pair(char [BUF1], int)’
        m0.insert(make_pair(buf1, 1));// 1 は 1個目と言う意味
                                   ^
In file included from /usr/include/c++/6/bits/stl_algobase.h:64:0,
                 from /usr/include/c++/6/bits/char_traits.h:39,
                 from /usr/include/c++/6/string:40,
                 from RMeCab.h:15,
                 from collocate.cpp:20:
/usr/include/c++/6/bits/stl_pair.h:497:5: note: candidate: template<class _T1, class _T2> constexpr std::pair<typename std::__decay_and_strip<_Tp>::__type, typename std::__decay_and_strip<_T2>::__type> std::make_pair(_T1&&, _T2&&)
     make_pair(_T1&& __x, _T2&& __y)
     ^~~~~~~~~
/usr/include/c++/6/bits/stl_pair.h:497:5: note:   template argument deduction/substitution failed:
collocate.cpp:145:35: note:   variable-sized array type ‘char (&)[BUF1]’ is not a valid template argument
        m0.insert(make_pair(buf1, 1));// 1 は 1個目と言う意味
                                   ^
/usr/lib/R/etc/Makeconf:141: recipe for target 'collocate.o' failed
make: *** [collocate.o] Error 1
ERROR: compilation failed for package ‘RMeCab’
* removing ‘/home/pi/R/arm-unknown-linux-gnueabihf-library/3.3/RMeCab’

The downloaded source packages are in
	‘/tmp/RtmpgI3wMv/downloaded_packages’
Warning message:
In install.packages("RMeCab", repos = "http://rmecab.jp/R") :
  installation of package ‘RMeCab’ had non-zero exit status

第四の壁

Mac OS Xでも、Windowsでも、IPA-DICは導入したはずなのに、
本には掲載のある「いる、くれる、する、なる、見守る、ある、考える、思う、果たす、入る、抜ける、富む」が拾えていない。

現象がWindows, Macintoshとも同じ。著者がどの辞書を使っているか問い合わせるとよいかも。

第五の壁

同じ命令で、同じエラー。本では

unmei.q <- myQuery(unmei2,rownames(rslt$tk))
4番目の俺がiters.listに存在しません
6番目の優先がiters.listに存在しません
7番目のしがiters.listに存在しません
10番目の面がiters.listに存在しません
Error in myQWuery(unmet.rownames(rst$tk))

Windows

unmei.q <- myQuery(unmei2,rownames(rslt$tk))
3 番目の cat(posNot[k], "\t", "番目の", notInList[k], "\t", "がterm.listに存在しません", でエラー:
 引数 4 (タイプ 'list') は 'cat' で取り扱えません

Macintosh

unmei.q <- myQuery(unmei,rownames(rslt$tk))
3¥t番目の cat(posNot[k], "¥t", "番目の", notInList[k], "¥t", "がterm.listに存在しません", でエラー:
 引数 4 (タイプ 'list') は 'cat' で取り扱えません

第六の壁  文字コードの壁 Raspberry PI & Macintosh

shift-jisからUTF-8に変換
docMatrix()関数もUTF-8じゃないと駄目

iconv -f Shift_JIS -t UTF8 doc01.txt > ../Mac/doc01.txt
...

参考文献/url

【R】【MeCab】RMeCabのインストールと形態素解析
https://qiita.com/hujuu/items/314a64a50875cdabf755

R(データマイニング入門) Raspbian(Raspberry PI),Mac OSX, docker/ubuntu 6段階。データ取得・導入・起動・実行・描画・一括実行。
https://qiita.com/kaizen_nagoya/items/e8417310129c2425af59

R(データマイニング入門) Windows 6段階。データ取得・導入・起動・スクリプト読み込み・一括実行・逐次実行。3つの罠。
https://qiita.com/kaizen_nagoya/items/52a288b002a54cd613d3

最後までおよみいただきありがとうございました。

いいね 💚、フォローをお願いします。

Thank you very much for reading to the last sentence.

Please press the like icon 💚 and follow me for your happy life.

1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0