1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

韓国語の形態素解析をJavaで行う

Last updated at Posted at 2024-10-29

概要

韓国語の形態素解析器はいくつか存在するようですが、「open-korean-text」がよさそうでした。

maven

  <dependency>
    <groupId>org.openkoreantext</groupId>
    <artifactId>open-korean-text</artifactId>
    <version>2.1.0</version>
  </dependency>

サンプルコード

package hello;

import java.util.List;

import org.openkoreantext.processor.KoreanTokenJava;
import org.openkoreantext.processor.OpenKoreanTextProcessorJava;
import org.openkoreantext.processor.tokenizer.KoreanTokenizer;

import scala.collection.Seq;

public class HelloOpenKoreanTextMain2 {

  public static void main(String[] args) {

  	// 今日は天気がよいので学校に歩いて行った
  	String text = "오늘은 날씨가 좋아서 걸어서 학교에 갔다.";

  	// Normalize
  	CharSequence normalized = OpenKoreanTextProcessorJava.normalize(text);

  	// Tokenize
  	Seq<KoreanTokenizer.KoreanToken> tokens = OpenKoreanTextProcessorJava.tokenize(normalized);

  	List<KoreanTokenJava> kk = OpenKoreanTextProcessorJava.tokensToJavaKoreanTokenList(tokens);

  	for (KoreanTokenJava k : kk) {
  		System.out.println("begin: " + k.getOffset());
  		System.out.println("end: " + (k.getOffset() + k.getLength()));
  		System.out.println("length: " + k.getLength());
  		System.out.println("lex: " + k.getStem()); // LEX 原型
  		System.out.println("str: " + k.getText()); // STR
  		System.out.println("pos: " + k.getPos().name()); // JOSA 助詞 Noun 名詞 Adjective 形容詞 Verb 動詞 Punctation ピリオド
  		System.out.println("isUnknown: " + k.isUnknown());
  		System.out.println("---");
  	}

  }

}


結果

形態素解析の結果は以下のようになります。韓国語には日本語と同様に動詞・形容詞の活用がありますが、きちんと原型が取得できています。「갔다(行った)」の原型「가다(行く)」が取れています。

begin: 0
end: 2
length: 2
lex: 
str: 오늘
pos: Noun
isUnknown: false
---
begin: 2
end: 3
length: 1
lex: 
str: 은
pos: Josa
isUnknown: false
---
begin: 4
end: 6
length: 2
lex: 
str: 날씨
pos: Noun
isUnknown: false
---
begin: 6
end: 7
length: 1
lex: 
str: 가
pos: Josa
isUnknown: false
---
begin: 8
end: 11
length: 3
lex: 좋다
str: 좋아서
pos: Adjective
isUnknown: false
---
begin: 12
end: 15
length: 3
lex: 걸다
str: 걸어서
pos: Verb
isUnknown: false
---
begin: 16
end: 18
length: 2
lex: 
str: 학교
pos: Noun
isUnknown: false
---
begin: 18
end: 19
length: 1
lex: 
str: 에
pos: Josa
isUnknown: false
---
begin: 20
end: 22
length: 2
lex: 가다
str: 갔다
pos: Verb
isUnknown: false
---
begin: 22
end: 23
length: 1
lex: 
str: .
pos: Punctuation
isUnknown: false
---

以上.

1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?