More than 5 years have passed since last update.

NLP4J [006-031] NLP4J で言語処理100本ノック #31 動詞

Last updated at 2020-01-20Posted at 2020-01-12

Indexに戻る

やってみます。

31. 動詞

動詞の表層形をすべて抽出せよ．

Maven

現在開発中のバージョンを利用します。

<dependency>
	<groupId>org.nlp4j</groupId>
	<artifactId>nlp4j-core</artifactId>
	<version>1.1.1.0-SNAPSHOT</version>
</dependency>

Text Data

デフォルトで利用している形態素解析（Yahoo! Japan デベロッパーネットワーク日本語形態素解析) では、リクエストサイズの上限が900KBであり、回数に制限もあるので小さなサイズのテキストファイルを利用しています。

Java Code

package nlp4j.nokku.chap4;

import java.util.List;

import nlp4j.Document;
import nlp4j.DocumentAnnotator;
import nlp4j.DocumentAnnotatorPipeline;
import nlp4j.Keyword;
import nlp4j.crawler.Crawler;
import nlp4j.crawler.TextFileLineSeparatedCrawler;
import nlp4j.impl.DefaultDocumentAnnotatorPipeline;
import nlp4j.index.DocumentIndex;
import nlp4j.index.SimpleDocumentIndex;
import nlp4j.yhoo_jp.YJpMaAnnotator;

public class Nokku31 {
	public static void main(String[] args) throws Exception {
		// NLP4Jが提供するテキストファイルのクローラーを利用する
		Crawler crawler = new TextFileLineSeparatedCrawler();
		crawler.setProperty("file", "src/test/resources/nlp4j.crawler/neko_short_utf8.txt");
		crawler.setProperty("encoding", "UTF-8");
		crawler.setProperty("target", "text");
		// ドキュメントのクロール
		List<Document> docs = crawler.crawlDocuments();
		// NLPパイプライン（複数の処理をパイプラインとして連結することで処理する）の定義
		DocumentAnnotatorPipeline pipeline = new DefaultDocumentAnnotatorPipeline();
		{
			// Yahoo! Japan の形態素解析APIを利用するアノテーター
			DocumentAnnotator annotator = new YJpMaAnnotator();
			pipeline.add(annotator);
		}
		// アノテーション処理の実行
		pipeline.annotate(docs);
		// キーワードをカウントするためにDocumentIndexを利用します。
		SimpleDocumentIndex index = new SimpleDocumentIndex();
		// ドキュメントの追加
		index.addDocuments(docs);
		List<Keyword> kwds = index.getKeywords();
		kwds = kwds.stream() //
				.filter(o -> o.getFacet().equals("動詞")) // 品詞が動詞
				.collect(Collectors.toList());
		for (Keyword kwd : kwds) {
			System.err.println(kwd.getStr());
		}
	}
}

結果

生れ
つか
し
泣い
始め
いう
見
聞く
いう
いう
捕え
煮
食う
いう
思わ
載せ
持ち上げ
し
あっ
落ちつい
見
いう
思っ
残っ
もっ
し
逢っ
出会わ
し
吹く
咽せ
く
弱っ
飲む
いう
知っ

まとめ

NLP4J を使うと、Javaで簡単に自然言語処理ができますね！

プロジェクトURL

https://www.nlp4j.org/

Indexに戻る

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up