8月3日の記事に続いて「modernbert-large-ukrainian-ud-goeswith」と「bert-large-ukrainian-ud-goeswith」も試作したので、RoBERTaモデルも含めウクライナ語係り受けの「精度」を、UD_Ukrainian-IUのuk_iu-ud-test.conlluとUD_Ukrainian-ParlaMintのuk_parlamint-ud-test.conlluで測ってみた。Google Colaboratoryだと、こんな感じ。
!pip install esupar transformers triton
models=[
"KoichiYasuoka/roberta-base-ukrainian-upos",
"KoichiYasuoka/roberta-base-ukrainian-ud-goeswith",
"KoichiYasuoka/roberta-base-wechsel-ukrainian-ud-goeswith",
"KoichiYasuoka/roberta-large-wechsel-ukrainian-ud-goeswith",
"KoichiYasuoka/bert-large-ukrainian-ud-goeswith",
"KoichiYasuoka/modernbert-large-ukrainian-ud-goeswith",
"KoichiYasuoka/modernbert-large-ukrainian-ud-embeds"
]
import os,sys,subprocess
url="https://github.com/UniversalDependencies/UD_Ukrainian-"
tests=["IU","ParlaMint"]
for t in tests:
u=url+t+"/raw/refs/heads/master/uk_"+t.lower()+"-ud-test.conllu"
f=os.path.basename(u)
os.system(f"test -f {f} || curl -LO {u}")
url="https://universaldependencies.org/conll18/conll18_ud_eval.py"
c=os.path.basename(url)
os.system(f"test -f {c} || curl -LO {url}")
for mdl in models:
if mdl.endswith("-upos"):
import esupar
nlp=esupar.load(mdl)
else:
from transformers import pipeline
nlp=pipeline("universal-dependencies",mdl,trust_remote_code=True,aggregation_strategy="simple")
for f in tests:
with open(f"uk_{f.lower()}-ud-test.conllu","r",encoding="utf-8") as r:
s=[t[8:].strip() for t in r if t.startswith("# text =")]
with open(f,"w",encoding="utf-8") as w:
for t in s:
w.write(str(nlp(t)).strip()+"\n\n")
os.system(f"mkdir -p result/{mdl}")
with open(f"result/{mdl}/result.txt","w",encoding="utf-8") as w:
for f in tests:
p=subprocess.run([sys.executable,c,"-v",f"uk_{f.lower()}-ud-test.conllu",f],encoding="utf-8",stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
print(f"\n*** {mdl} ({f})",p.stdout,sep="\n",file=w)
!( cd result && cat `find {" ".join(models)} -name result.txt` )
私(安岡孝一)の手元では、以下の結果が出力された。
*** KoichiYasuoka/roberta-base-ukrainian-upos (IU)
Metric | Precision | Recall | F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens | 99.70 | 99.63 | 99.67 |
Sentences | 100.00 | 100.00 | 100.00 |
Words | 99.69 | 99.61 | 99.65 |
UPOS | 96.90 | 96.81 | 96.85 | 97.20
XPOS | 0.00 | 0.00 | 0.00 | 0.00
UFeats | 91.59 | 91.51 | 91.55 | 91.88
AllTags | 0.00 | 0.00 | 0.00 | 0.00
Lemmas | 0.01 | 0.01 | 0.01 | 0.01
UAS | 88.97 | 88.89 | 88.93 | 89.24
LAS | 86.18 | 86.10 | 86.14 | 86.44
CLAS | 83.21 | 82.92 | 83.06 | 83.34
MLAS | 72.93 | 72.68 | 72.80 | 73.05
BLEX | 0.00 | 0.00 | 0.00 | 0.00
*** KoichiYasuoka/roberta-base-ukrainian-upos (ParlaMint)
Metric | Precision | Recall | F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens | 99.77 | 99.91 | 99.84 |
Sentences | 100.00 | 100.00 | 100.00 |
Words | 99.77 | 99.91 | 99.84 |
UPOS | 97.89 | 98.03 | 97.96 | 98.12
XPOS | 0.00 | 0.00 | 0.00 | 0.00
UFeats | 92.57 | 92.70 | 92.63 | 92.78
AllTags | 0.00 | 0.00 | 0.00 | 0.00
Lemmas | 0.00 | 0.00 | 0.00 | 0.00
UAS | 92.14 | 92.27 | 92.21 | 92.35
LAS | 89.13 | 89.26 | 89.19 | 89.34
CLAS | 86.54 | 86.53 | 86.54 | 86.63
MLAS | 76.98 | 76.97 | 76.97 | 77.05
BLEX | 0.00 | 0.00 | 0.00 | 0.00
*** KoichiYasuoka/roberta-base-ukrainian-ud-goeswith (IU)
Metric | Precision | Recall | F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens | 99.54 | 99.11 | 99.32 |
Sentences | 100.00 | 100.00 | 100.00 |
Words | 99.53 | 99.09 | 99.31 |
UPOS | 96.40 | 95.97 | 96.19 | 96.86
XPOS | 0.00 | 0.00 | 0.00 | 0.00
UFeats | 91.30 | 90.89 | 91.09 | 91.73
AllTags | 0.00 | 0.00 | 0.00 | 0.00
Lemmas | 0.01 | 0.01 | 0.01 | 0.01
UAS | 87.20 | 86.82 | 87.01 | 87.61
LAS | 83.38 | 83.02 | 83.20 | 83.78
CLAS | 80.15 | 79.24 | 79.70 | 80.08
MLAS | 72.01 | 71.19 | 71.60 | 71.95
BLEX | 0.00 | 0.00 | 0.00 | 0.00
*** KoichiYasuoka/roberta-base-ukrainian-ud-goeswith (ParlaMint)
Metric | Precision | Recall | F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens | 99.51 | 99.77 | 99.64 |
Sentences | 100.00 | 100.00 | 100.00 |
Words | 99.51 | 99.77 | 99.64 |
UPOS | 97.45 | 97.71 | 97.58 | 97.93
XPOS | 0.00 | 0.00 | 0.00 | 0.00
UFeats | 92.67 | 92.92 | 92.79 | 93.13
AllTags | 0.00 | 0.00 | 0.00 | 0.00
Lemmas | 0.00 | 0.00 | 0.00 | 0.00
UAS | 90.62 | 90.86 | 90.74 | 91.07
LAS | 87.04 | 87.27 | 87.15 | 87.47
CLAS | 84.25 | 84.09 | 84.17 | 84.38
MLAS | 76.71 | 76.57 | 76.64 | 76.83
BLEX | 0.00 | 0.00 | 0.00 | 0.00
*** KoichiYasuoka/roberta-base-wechsel-ukrainian-ud-goeswith (IU)
Metric | Precision | Recall | F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens | 98.58 | 97.29 | 97.93 |
Sentences | 100.00 | 100.00 | 100.00 |
Words | 98.57 | 97.27 | 97.92 |
UPOS | 96.13 | 94.86 | 95.49 | 97.52
XPOS | 0.00 | 0.00 | 0.00 | 0.00
UFeats | 91.28 | 90.08 | 90.68 | 92.61
AllTags | 0.00 | 0.00 | 0.00 | 0.00
Lemmas | 0.01 | 0.01 | 0.01 | 0.01
UAS | 92.14 | 90.93 | 91.53 | 93.48
LAS | 88.66 | 87.49 | 88.07 | 89.94
CLAS | 87.59 | 86.99 | 87.29 | 87.70
MLAS | 78.59 | 78.05 | 78.32 | 78.69
BLEX | 0.00 | 0.00 | 0.00 | 0.00
*** KoichiYasuoka/roberta-base-wechsel-ukrainian-ud-goeswith (ParlaMint)
Metric | Precision | Recall | F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens | 99.19 | 99.09 | 99.14 |
Sentences | 100.00 | 100.00 | 100.00 |
Words | 99.19 | 99.09 | 99.14 |
UPOS | 97.40 | 97.30 | 97.35 | 98.19
XPOS | 0.00 | 0.00 | 0.00 | 0.00
UFeats | 92.62 | 92.52 | 92.57 | 93.38
AllTags | 0.00 | 0.00 | 0.00 | 0.00
Lemmas | 0.00 | 0.00 | 0.00 | 0.00
UAS | 93.39 | 93.30 | 93.35 | 94.16
LAS | 90.05 | 89.96 | 90.00 | 90.79
CLAS | 88.14 | 88.20 | 88.17 | 88.41
MLAS | 79.65 | 79.70 | 79.68 | 79.90
BLEX | 0.00 | 0.00 | 0.00 | 0.00
*** KoichiYasuoka/roberta-large-wechsel-ukrainian-ud-goeswith (IU)
Metric | Precision | Recall | F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens | 98.45 | 97.25 | 97.85 |
Sentences | 100.00 | 100.00 | 100.00 |
Words | 98.44 | 97.23 | 97.83 |
UPOS | 96.29 | 95.11 | 95.70 | 97.82
XPOS | 0.00 | 0.00 | 0.00 | 0.00
UFeats | 91.84 | 90.72 | 91.28 | 93.30
AllTags | 0.00 | 0.00 | 0.00 | 0.00
Lemmas | 0.01 | 0.01 | 0.01 | 0.01
UAS | 91.44 | 90.32 | 90.88 | 92.90
LAS | 88.40 | 87.32 | 87.86 | 89.81
CLAS | 87.15 | 86.76 | 86.95 | 87.50
MLAS | 79.46 | 79.10 | 79.28 | 79.77
BLEX | 0.00 | 0.00 | 0.00 | 0.00
*** KoichiYasuoka/roberta-large-wechsel-ukrainian-ud-goeswith (ParlaMint)
Metric | Precision | Recall | F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens | 99.25 | 99.10 | 99.17 |
Sentences | 100.00 | 100.00 | 100.00 |
Words | 99.25 | 99.10 | 99.17 |
UPOS | 97.35 | 97.20 | 97.27 | 98.08
XPOS | 0.00 | 0.00 | 0.00 | 0.00
UFeats | 93.06 | 92.92 | 92.99 | 93.76
AllTags | 0.00 | 0.00 | 0.00 | 0.00
Lemmas | 0.00 | 0.00 | 0.00 | 0.00
UAS | 93.64 | 93.49 | 93.56 | 94.34
LAS | 90.62 | 90.48 | 90.55 | 91.30
CLAS | 88.88 | 88.95 | 88.92 | 89.16
MLAS | 81.06 | 81.13 | 81.09 | 81.31
BLEX | 0.00 | 0.00 | 0.00 | 0.00
*** KoichiYasuoka/bert-large-ukrainian-ud-goeswith (IU)
Metric | Precision | Recall | F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens | 99.71 | 99.55 | 99.63 |
Sentences | 100.00 | 100.00 | 100.00 |
Words | 99.70 | 99.52 | 99.61 |
UPOS | 97.49 | 97.32 | 97.41 | 97.79
XPOS | 0.00 | 0.00 | 0.00 | 0.00
UFeats | 93.54 | 93.38 | 93.46 | 93.83
AllTags | 0.00 | 0.00 | 0.00 | 0.00
Lemmas | 0.01 | 0.01 | 0.01 | 0.01
UAS | 93.54 | 93.38 | 93.46 | 93.83
LAS | 90.21 | 90.06 | 90.13 | 90.49
CLAS | 87.88 | 87.49 | 87.68 | 87.91
MLAS | 80.34 | 79.99 | 80.16 | 80.37
BLEX | 0.00 | 0.00 | 0.00 | 0.00
*** KoichiYasuoka/bert-large-ukrainian-ud-goeswith (ParlaMint)
Metric | Precision | Recall | F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens | 99.52 | 99.81 | 99.66 |
Sentences | 100.00 | 100.00 | 100.00 |
Words | 99.52 | 99.81 | 99.66 |
UPOS | 97.70 | 97.98 | 97.84 | 98.17
XPOS | 0.00 | 0.00 | 0.00 | 0.00
UFeats | 93.79 | 94.07 | 93.93 | 94.25
AllTags | 0.00 | 0.00 | 0.00 | 0.00
Lemmas | 0.00 | 0.00 | 0.00 | 0.00
UAS | 93.66 | 93.93 | 93.79 | 94.11
LAS | 90.84 | 91.11 | 90.98 | 91.28
CLAS | 88.79 | 88.85 | 88.82 | 89.11
MLAS | 81.57 | 81.63 | 81.60 | 81.86
BLEX | 0.00 | 0.00 | 0.00 | 0.00
*** KoichiYasuoka/modernbert-large-ukrainian-ud-goeswith (IU)
Metric | Precision | Recall | F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens | 99.74 | 99.41 | 99.58 |
Sentences | 100.00 | 100.00 | 100.00 |
Words | 99.73 | 99.38 | 99.56 |
UPOS | 97.70 | 97.36 | 97.53 | 97.97
XPOS | 0.00 | 0.00 | 0.00 | 0.00
UFeats | 94.51 | 94.18 | 94.34 | 94.76
AllTags | 0.00 | 0.00 | 0.00 | 0.00
Lemmas | 0.01 | 0.01 | 0.01 | 0.01
UAS | 94.57 | 94.24 | 94.41 | 94.83
LAS | 91.78 | 91.46 | 91.62 | 92.02
CLAS | 90.04 | 89.56 | 89.80 | 90.12
MLAS | 83.50 | 83.05 | 83.28 | 83.58
BLEX | 0.00 | 0.00 | 0.00 | 0.00
*** KoichiYasuoka/modernbert-large-ukrainian-ud-goeswith (ParlaMint)
Metric | Precision | Recall | F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens | 99.74 | 99.90 | 99.82 |
Sentences | 100.00 | 100.00 | 100.00 |
Words | 99.74 | 99.90 | 99.82 |
UPOS | 98.13 | 98.28 | 98.21 | 98.38
XPOS | 0.00 | 0.00 | 0.00 | 0.00
UFeats | 94.30 | 94.45 | 94.38 | 94.54
AllTags | 0.00 | 0.00 | 0.00 | 0.00
Lemmas | 0.00 | 0.00 | 0.00 | 0.00
UAS | 94.59 | 94.73 | 94.66 | 94.83
LAS | 91.91 | 92.05 | 91.98 | 92.14
CLAS | 90.26 | 90.44 | 90.35 | 90.58
MLAS | 82.90 | 83.07 | 82.98 | 83.20
BLEX | 0.00 | 0.00 | 0.00 | 0.00
*** KoichiYasuoka/modernbert-large-ukrainian-ud-embeds (IU)
Metric | Precision | Recall | F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens | 99.72 | 99.48 | 99.60 |
Sentences | 100.00 | 100.00 | 100.00 |
Words | 99.71 | 99.45 | 99.58 |
UPOS | 97.08 | 96.83 | 96.95 | 97.36
XPOS | 0.00 | 0.00 | 0.00 | 0.00
UFeats | 92.69 | 92.45 | 92.57 | 92.96
AllTags | 0.00 | 0.00 | 0.00 | 0.00
Lemmas | 0.01 | 0.01 | 0.01 | 0.01
UAS | 91.16 | 90.93 | 91.04 | 91.43
LAS | 87.82 | 87.59 | 87.71 | 88.07
CLAS | 85.69 | 85.04 | 85.36 | 85.57
MLAS | 77.82 | 77.22 | 77.52 | 77.70
BLEX | 0.00 | 0.00 | 0.00 | 0.00
*** KoichiYasuoka/modernbert-large-ukrainian-ud-embeds (ParlaMint)
Metric | Precision | Recall | F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens | 99.63 | 99.85 | 99.74 |
Sentences | 100.00 | 100.00 | 100.00 |
Words | 99.63 | 99.85 | 99.74 |
UPOS | 97.75 | 97.97 | 97.86 | 98.12
XPOS | 0.00 | 0.00 | 0.00 | 0.00
UFeats | 93.28 | 93.49 | 93.39 | 93.63
AllTags | 0.00 | 0.00 | 0.00 | 0.00
Lemmas | 0.00 | 0.00 | 0.00 | 0.00
UAS | 91.89 | 92.10 | 92.00 | 92.24
LAS | 88.62 | 88.83 | 88.73 | 88.96
CLAS | 86.92 | 87.07 | 87.00 | 87.26
MLAS | 79.33 | 79.46 | 79.39 | 79.63
BLEX | 0.00 | 0.00 | 0.00 | 0.00
UPOS/LAS/MLASを表にしてみよう。
uk_iu-ud-test.conllu | uk_parlamint-ud-test.conllu | |
---|---|---|
roberta-base-ukrainian-upos | 96.85/86.14/72.80 | 97.96/89.19/76.97 |
roberta-base-ukrainian-ud-goeswith | 96.19/83.20/71.60 | 97.58/87.15/76.64 |
roberta-base-wechsel-ukrainian-ud-goeswith | 95.49/88.07/78.32 | 97.35/90.00/79.68 |
roberta-large-wechsel-ukrainian-ud-goeswith | 95.70/87.86/79.28 | 97.27/90.55/81.09 |
bert-large-ukrainian-ud-goeswith | 97.41/90.13/80.16 | 97.84/90.98/81.60 |
modernbert-large-ukrainian-ud-goeswith | 97.53/91.62/83.28 | 98.21/91.98/82.98 |
modernbert-large-ukrainian-ud-embeds | 96.95/87.71/77.52 | 97.86/88.73/79.39 |
やはりModernBERTの精度が高い。ただ、上三角行列によるmodernbert-large-ukrainian-ud-embedsの精度がイマイチで、このあたり、チューニング手法をもう少し考える必要がありそうだ。