国語研長単位係り受け解析モデルmodernbert-base-japanese-aozora-ud-goeswithリリース

Last updated at 2025-01-05Posted at 2025-01-03

昨日の記事の続きだが、modernbert-base-japanese-aozoraには内部にModernBertForTokenClassificationを実装しておいた。現時点ではinputs_embedsが無いので取り回しがちょっと面倒なのだが、国語研長単位UD_Japanese-GSDLUWをもとに、係り受け解析モデルmodernbert-base-japanese-aozora-ud-goeswithを試作してみた。ja_gsdluw-ud-test.conlluによるベンチマーク・プログラムは、Google Colaboratory (GPU版)だと、こんな感じ。

!pip install transformers triton
models=["KoichiYasuoka/modernbert-base-japanese-aozora-ud-goeswith"]
import os,sys,subprocess
from transformers import pipeline
url="https://github.com/UniversalDependencies/UD_Japanese-GSDLUW"
d=os.path.basename(url)
os.system(f"test -d {d} || git clone --depth=1 {url}")
os.system("for F in train dev test ; do cp "+d+"/*-$F.conllu $F.conllu ; done")
url="https://universaldependencies.org/conll18/conll18_ud_eval.py"
c=os.path.basename(url)
os.system(f"test -f {c} || curl -LO {url}")
with open("test.conllu","r",encoding="utf-8") as r:
  s=[t[8:].strip() for t in r if t.startswith("# text =")]
for mdl in models:
  nlp=pipeline("universal-dependencies",mdl,trust_remote_code=True,aggregation_strategy="simple",device=0)
  with open("result.conllu","w",encoding="utf-8") as w:
    for t in s:
      w.write(nlp(t))
  p=subprocess.run([sys.executable,c,"-v","test.conllu","result.conllu"],encoding="utf-8",stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
  print(f"\n*** {mdl}",p.stdout,sep="\n",flush=True)

私(安岡孝一)の手元では、以下の結果が出力された。

*** KoichiYasuoka/modernbert-base-japanese-aozora-ud-goeswith
Metric     | Precision |    Recall |  F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens     |     96.83 |     96.65 |     96.74 |
Sentences  |    100.00 |    100.00 |    100.00 |
Words      |     96.83 |     96.65 |     96.74 |
UPOS       |     94.30 |     94.13 |     94.22 |     97.39
XPOS       |      0.00 |      0.00 |      0.00 |      0.00
UFeats     |     96.80 |     96.62 |     96.71 |     99.97
AllTags    |      0.00 |      0.00 |      0.00 |      0.00
Lemmas     |      0.00 |      0.00 |      0.00 |      0.00
UAS        |     88.49 |     88.33 |     88.41 |     91.39
LAS        |     87.04 |     86.88 |     86.96 |     89.89
CLAS       |     80.19 |     80.21 |     80.20 |     84.36
MLAS       |     75.87 |     75.88 |     75.88 |     79.81
BLEX       |      0.00 |      0.00 |      0.00 |      0.00

UPOS/LAS/MLASが94.22/86.96/75.88なので、『GPT系言語モデルによる国語研長単位係り受け解析』表3～5や『青空文庫DeBERTaモデルによる国語研長単位係り受け解析』表6～7の「テスト(predict)」に較べて、イマイチ精度が足りない感じだ。うーん、aozorabunko-cleanだけだと、データが少な過ぎるのかしら。それとも、ファインチューニングに何か工夫が必要なのかしら。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up