英語係り受け解析ModernBERTモデルの「精度」をUD_English-EWTのtestセットで測る

Posted at 2025-01-21

一昨日の記事に続いて「modernbert-large-english-ud-triangular」も試作したので、RoBERTaモデルも含め、英語係り受け解析の「精度」をUD_English-EWTのen_ewt-ud-test.conlluで測ってみた。Google Colaboratory (GPU版)だと、こんな感じ。

!pip install transformers triton
models=[
  "KoichiYasuoka/modernbert-base-english-ud-triangular",
  "KoichiYasuoka/modernbert-large-english-ud-triangular",
  "KoichiYasuoka/roberta-base-english-ud-goeswith",
  "KoichiYasuoka/roberta-large-english-ud-goeswith"
]
import os,sys,subprocess
from transformers import pipeline
url="https://github.com/UniversalDependencies/UD_English-EWT"
d=os.path.basename(url)
os.system(f"test -d {d} || git clone --depth=1 {url}")
os.system("for F in train dev test ; do cp "+d+"/*-$F.conllu $F.conllu ; done")
url="https://universaldependencies.org/conll18/conll18_ud_eval.py"
c=os.path.basename(url)
os.system(f"test -f {c} || curl -LO {url}")
with open("test.conllu","r",encoding="utf-8") as r:
  s=[t[8:].strip() for t in r if t.startswith("# text =")]
for mdl in models:
  nlp=pipeline("universal-dependencies",mdl,trust_remote_code=True,
    aggregation_strategy="simple",device=0)
  with open("result.conllu","w",encoding="utf-8") as w:
    for t in s:
      w.write(nlp(t).strip()+"\n\n")
  p=subprocess.run([sys.executable,c,"-v","test.conllu","result.conllu"],
    encoding="utf-8",stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
  with open("result.txt","w",encoding="utf-8") as w:
    print(f"\n*** {mdl}",p.stdout,sep="\n",file=w)
  os.system(f"mkdir -p result/{mdl} ; mv result.conllu result.txt result/{mdl}")
!( cd result && cat `find {" ".join(models)} -name result.txt` )

私(安岡孝一)の手元では、以下の結果が出力された。

*** KoichiYasuoka/modernbert-base-english-ud-triangular
Metric     | Precision |    Recall |  F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens     |     96.60 |     97.58 |     97.09 |
Sentences  |    100.00 |    100.00 |    100.00 |
Words      |     98.84 |     98.44 |     98.64 |
UPOS       |     96.35 |     95.96 |     96.16 |     97.48
XPOS       |      0.00 |      0.00 |      0.00 |      0.00
UFeats     |     96.03 |     95.64 |     95.84 |     97.16
AllTags    |      0.00 |      0.00 |      0.00 |      0.00
Lemmas     |      0.03 |      0.03 |      0.03 |      0.03
UAS        |     92.58 |     92.20 |     92.39 |     93.66
LAS        |     90.49 |     90.12 |     90.31 |     91.55
CLAS       |     88.08 |     87.94 |     88.01 |     88.85
MLAS       |     83.23 |     83.09 |     83.16 |     83.96
BLEX       |      0.00 |      0.00 |      0.00 |      0.00

*** KoichiYasuoka/modernbert-large-english-ud-triangular
Metric     | Precision |    Recall |  F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens     |     96.68 |     97.61 |     97.14 |
Sentences  |    100.00 |    100.00 |    100.00 |
Words      |     98.90 |     98.44 |     98.67 |
UPOS       |     96.65 |     96.20 |     96.43 |     97.73
XPOS       |      0.00 |      0.00 |      0.00 |      0.00
UFeats     |     96.35 |     95.91 |     96.13 |     97.43
AllTags    |      0.00 |      0.00 |      0.00 |      0.00
Lemmas     |      0.03 |      0.03 |      0.03 |      0.03
UAS        |     92.93 |     92.50 |     92.71 |     93.97
LAS        |     91.06 |     90.64 |     90.85 |     92.08
CLAS       |     88.93 |     88.73 |     88.83 |     89.63
MLAS       |     84.62 |     84.43 |     84.53 |     85.29
BLEX       |      0.00 |      0.00 |      0.00 |      0.00

*** KoichiYasuoka/roberta-base-english-ud-goeswith
Metric     | Precision |    Recall |  F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens     |     97.48 |     97.81 |     97.65 |
Sentences  |    100.00 |    100.00 |    100.00 |
Words      |     98.71 |     97.65 |     98.18 |
UPOS       |     95.41 |     94.39 |     94.89 |     96.65
XPOS       |      0.00 |      0.00 |      0.00 |      0.00
UFeats     |     91.26 |     90.28 |     90.77 |     92.45
AllTags    |      0.00 |      0.00 |      0.00 |      0.00
Lemmas     |      0.03 |      0.03 |      0.03 |      0.03
UAS        |     89.39 |     88.43 |     88.90 |     90.55
LAS        |     87.07 |     86.14 |     86.60 |     88.21
CLAS       |     87.47 |     86.09 |     86.77 |     87.55
MLAS       |     74.67 |     73.48 |     74.07 |     74.73
BLEX       |      0.00 |      0.00 |      0.00 |      0.00

*** KoichiYasuoka/roberta-large-english-ud-goeswith
Metric     | Precision |    Recall |  F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens     |     97.55 |     97.99 |     97.77 |
Sentences  |    100.00 |    100.00 |    100.00 |
Words      |     98.77 |     97.82 |     98.29 |
UPOS       |     96.06 |     95.14 |     95.60 |     97.26
XPOS       |      0.00 |      0.00 |      0.00 |      0.00
UFeats     |     95.43 |     94.51 |     94.96 |     96.61
AllTags    |      0.00 |      0.00 |      0.00 |      0.00
Lemmas     |      0.03 |      0.03 |      0.03 |      0.03
UAS        |     92.42 |     91.53 |     91.97 |     93.57
LAS        |     90.41 |     89.54 |     89.98 |     91.54
CLAS       |     88.49 |     87.62 |     88.06 |     88.96
MLAS       |     82.54 |     81.72 |     82.13 |     82.97
BLEX       |      0.00 |      0.00 |      0.00 |      0.00

UPOS/LAS/MLASで見る限り、modernbert-large-english-ud-triangularが96.43/90.85/84.53でダントツである。Stanzaが96.20/86.77/80.27、UDPipe2が96.57/89.14/82.66、Trankitが96.65/89.40/83.45という公表値なので、かなり善戦していると言えるだろう。ただし、modernbert-large-english-ud-triangularはLEMMAをサポートしていないので、BLEXが0.00になってしまう。さて、どうしようかな。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up