一昨日の記事に続いて「modernbert-large-english-ud-triangular」も試作したので、RoBERTaモデルも含め、英語係り受け解析の「精度」をUD_English-EWTのen_ewt-ud-test.conlluで測ってみた。Google Colaboratory (GPU版)だと、こんな感じ。
!pip install transformers triton
models=[
"KoichiYasuoka/modernbert-base-english-ud-triangular",
"KoichiYasuoka/modernbert-large-english-ud-triangular",
"KoichiYasuoka/roberta-base-english-ud-goeswith",
"KoichiYasuoka/roberta-large-english-ud-goeswith"
]
import os,sys,subprocess
from transformers import pipeline
url="https://github.com/UniversalDependencies/UD_English-EWT"
d=os.path.basename(url)
os.system(f"test -d {d} || git clone --depth=1 {url}")
os.system("for F in train dev test ; do cp "+d+"/*-$F.conllu $F.conllu ; done")
url="https://universaldependencies.org/conll18/conll18_ud_eval.py"
c=os.path.basename(url)
os.system(f"test -f {c} || curl -LO {url}")
with open("test.conllu","r",encoding="utf-8") as r:
s=[t[8:].strip() for t in r if t.startswith("# text =")]
for mdl in models:
nlp=pipeline("universal-dependencies",mdl,trust_remote_code=True,
aggregation_strategy="simple",device=0)
with open("result.conllu","w",encoding="utf-8") as w:
for t in s:
w.write(nlp(t).strip()+"\n\n")
p=subprocess.run([sys.executable,c,"-v","test.conllu","result.conllu"],
encoding="utf-8",stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
with open("result.txt","w",encoding="utf-8") as w:
print(f"\n*** {mdl}",p.stdout,sep="\n",file=w)
os.system(f"mkdir -p result/{mdl} ; mv result.conllu result.txt result/{mdl}")
!( cd result && cat `find {" ".join(models)} -name result.txt` )
私(安岡孝一)の手元では、以下の結果が出力された。
*** KoichiYasuoka/modernbert-base-english-ud-triangular
Metric | Precision | Recall | F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens | 96.60 | 97.58 | 97.09 |
Sentences | 100.00 | 100.00 | 100.00 |
Words | 98.84 | 98.44 | 98.64 |
UPOS | 96.35 | 95.96 | 96.16 | 97.48
XPOS | 0.00 | 0.00 | 0.00 | 0.00
UFeats | 96.03 | 95.64 | 95.84 | 97.16
AllTags | 0.00 | 0.00 | 0.00 | 0.00
Lemmas | 0.03 | 0.03 | 0.03 | 0.03
UAS | 92.58 | 92.20 | 92.39 | 93.66
LAS | 90.49 | 90.12 | 90.31 | 91.55
CLAS | 88.08 | 87.94 | 88.01 | 88.85
MLAS | 83.23 | 83.09 | 83.16 | 83.96
BLEX | 0.00 | 0.00 | 0.00 | 0.00
*** KoichiYasuoka/modernbert-large-english-ud-triangular
Metric | Precision | Recall | F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens | 96.68 | 97.61 | 97.14 |
Sentences | 100.00 | 100.00 | 100.00 |
Words | 98.90 | 98.44 | 98.67 |
UPOS | 96.65 | 96.20 | 96.43 | 97.73
XPOS | 0.00 | 0.00 | 0.00 | 0.00
UFeats | 96.35 | 95.91 | 96.13 | 97.43
AllTags | 0.00 | 0.00 | 0.00 | 0.00
Lemmas | 0.03 | 0.03 | 0.03 | 0.03
UAS | 92.93 | 92.50 | 92.71 | 93.97
LAS | 91.06 | 90.64 | 90.85 | 92.08
CLAS | 88.93 | 88.73 | 88.83 | 89.63
MLAS | 84.62 | 84.43 | 84.53 | 85.29
BLEX | 0.00 | 0.00 | 0.00 | 0.00
*** KoichiYasuoka/roberta-base-english-ud-goeswith
Metric | Precision | Recall | F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens | 97.48 | 97.81 | 97.65 |
Sentences | 100.00 | 100.00 | 100.00 |
Words | 98.71 | 97.65 | 98.18 |
UPOS | 95.41 | 94.39 | 94.89 | 96.65
XPOS | 0.00 | 0.00 | 0.00 | 0.00
UFeats | 91.26 | 90.28 | 90.77 | 92.45
AllTags | 0.00 | 0.00 | 0.00 | 0.00
Lemmas | 0.03 | 0.03 | 0.03 | 0.03
UAS | 89.39 | 88.43 | 88.90 | 90.55
LAS | 87.07 | 86.14 | 86.60 | 88.21
CLAS | 87.47 | 86.09 | 86.77 | 87.55
MLAS | 74.67 | 73.48 | 74.07 | 74.73
BLEX | 0.00 | 0.00 | 0.00 | 0.00
*** KoichiYasuoka/roberta-large-english-ud-goeswith
Metric | Precision | Recall | F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens | 97.55 | 97.99 | 97.77 |
Sentences | 100.00 | 100.00 | 100.00 |
Words | 98.77 | 97.82 | 98.29 |
UPOS | 96.06 | 95.14 | 95.60 | 97.26
XPOS | 0.00 | 0.00 | 0.00 | 0.00
UFeats | 95.43 | 94.51 | 94.96 | 96.61
AllTags | 0.00 | 0.00 | 0.00 | 0.00
Lemmas | 0.03 | 0.03 | 0.03 | 0.03
UAS | 92.42 | 91.53 | 91.97 | 93.57
LAS | 90.41 | 89.54 | 89.98 | 91.54
CLAS | 88.49 | 87.62 | 88.06 | 88.96
MLAS | 82.54 | 81.72 | 82.13 | 82.97
BLEX | 0.00 | 0.00 | 0.00 | 0.00
UPOS/LAS/MLASで見る限り、modernbert-large-english-ud-triangularが96.43/90.85/84.53でダントツである。Stanzaが96.20/86.77/80.27、UDPipe2が96.57/89.14/82.66、Trankitが96.65/89.40/83.45という公表値なので、かなり善戦していると言えるだろう。ただし、modernbert-large-english-ud-triangularはLEMMAをサポートしていないので、BLEXが0.00になってしまう。さて、どうしようかな。