Sentiment Treebankファイルのリーダー #Python

stanford sentiment treebank dataset のファイルを読み込むプログラムを書いた．

使用例
https://github.com/niitsuma/word2vec-keras-in-gensim/blob/master/example/treebank-classify.py

word2veckerasというパッケージの一部になっているので

pip install world2veckeras

で使える．

この使用例ではgensimのdoc2vecで分類精度が高いパラメータをsklearn.grid_search.GridSearchCVを使ってサーチするようになっている．この例で使っているSentenceClassifierとDoc2VecClassifierについてはこちらの解説を参照

実装の本体は
https://github.com/niitsuma/word2vec-keras-in-gensim/blob/master/word2veckeras/treebank.py
でnltkのTree classに読み込むようになっている．
実装は非常に簡単．

他に
https://pypi.python.org/pypi/pytreebank/
という似たパッケージもあったがfine grade，positive negativeなどの細かい設定ができないようだったので，単純にnltkのTreeに流し込むだけにしてTreeデータをいろいろいじれるようにした