言語処理100本ノック 2015
「53. Tokenization
Stanford Core NLPを用い,入力テキストの解析結果をXML形式で得よ.また,このXMLファイルを読み込み,入力テキストを1行1単語の形式で出力せよ」
Stanford Core NLP
https://stanfordnlp.github.io/CoreNLP/
の導入をしてみる。
#1回目
mac
$ docker run -it ubuntu /bin/bash
dockerが起動したら
docker
# apt-get update
# apt-get install -y wget
# apt-get install -y unzip
# apt-get install -y vim
# apt-get install -y sudo
# apt-get install -y maven
# wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip
# unzip stanford-corenlp-full-2018-10-05.zip
# java --version
openjdk 10.0.2 2018-07-17
OpenJDK Runtime Environment (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.4)
OpenJDK 64-Bit Server VM (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.4, mixed mode)
# apt-get install make
# apt-get install default-jdk
下記2行を~/.bashrcに追記
~/.bashrc
for file in `find /stanford-corenlp-full-2018-10-05/ -name "*.jar"`; do export
CLASSPATH="$CLASSPATH:`realpath $file`"; done
実行してみる。
# sudo sh ~/.bashrc
/root/.bashrc: 13: /root/.bashrc: shopt: not found
/root/.bashrc: 21: /root/.bashrc: shopt: not found
export HOME='/root'
export HOSTNAME='c0c48076c758'
export LOGNAME='root'
in.sh
echo "the quick brown fox jumped over the lazy dog" > input.txt
java -mx3g edu.stanford.nlp.pipeline.StanfordCoreNLP -outputFormat json -file input.txt
実行してみる。
# sh in.sh
Error: Could not find or load main class edu.stanford.nlp.pipeline.StanfordCoreNLP
Caused by: java.lang.ClassNotFoundException: edu.stanford.nlp.pipeline.StanfordCoreNLP
root@c0c48076c758:/stanford-corenlp-full-2018-10-05# cat ~/.bashrc
# make
if [ ! -e src ] ; then \
mkdir src ; cd src ; jar -xf ../stanford-corenlp-*-sources.jar; \
fi;
mkdir -p classes
javac -O -d classes -encoding utf-8 src/edu/stanford/nlp/*/*.java \
src/edu/stanford/nlp/*/*/*.java \
src/edu/stanford/nlp/*/*/*/*.java \
src/edu/stanford/nlp/*/*/*/*/*.java \
src/edu/stanford/nlp/*/*/*/*/*/*.java
javac: file not found: src/edu/stanford/nlp/*/*.java
Usage: javac <options> <source files>
use --help for a list of possible options
Makefile:9: recipe for target 'corenlp' failed
make: *** [corenlp] Error 2
#2回目
$ docker run -it continuumio/anaconda /bin/bash
(中略)
これpython2だった。最後まで行ってから気が付いた。泣)
#3回目
$ docker run -it continuumio/anaconda3 /bin/bash