Martyna Wiącek, Piotr Rybak, Łukasz Pszenny, Alina Wróblewska『NLPre: A Revised Approach towards Language-centric Benchmarking of Natural Language Preprocessing Systems』の中国語(繁體字)ベンチマークNLPre-ZHに参加すべく、『Sequence-Labeling RoBERTa Model for Dependency-Parsing in Classical Chinese and Its Application to Vietnamese and Thai』で作ったRoBERTaモデルや、2023年1月3日の日記で公開した二郎神DeBERTaモデルを、ざっとベンチマークにかけてみた。Google Colaboratory (GPU版)だと、こんな感じ。
!pip install transformers
models=["KoichiYasuoka/roberta-base-chinese-ud-goeswith","KoichiYasuoka/deberta-base-chinese-ud-goeswith","KoichiYasuoka/deberta-base-chinese-erlangshen-ud-goeswith","KoichiYasuoka/deberta-large-chinese-erlangshen-ud-goeswith","KoichiYasuoka/deberta-xlarge-chinese-erlangshen-ud-goeswith"]
import os,sys,subprocess
from transformers import pipeline
url="https://github.com/UniversalDependencies/UD_Chinese-GSD"
d=os.path.basename(url)
os.system(f"test -d {d} || git clone --depth=1 {url}")
os.system("for F in train dev test ; do cp "+d+"/*-$F.conllu $F.conllu ; done")
url="https://universaldependencies.org/conll18/conll18_ud_eval.py"
c=os.path.basename(url)
os.system(f"test -f {c} || curl -LO {url}")
with open("test.conllu","r",encoding="utf-8") as r:
s=[t[8:].strip() for t in r if t.startswith("# text =")]
for mdl in models:
nlp=pipeline("universal-dependencies",mdl,trust_remote_code=True,aggregation_strategy="simple",device=0)
with open("result.conllu","w",encoding="utf-8") as w:
for t in s:
w.write(nlp(t))
p=subprocess.run([sys.executable,c,"-v","test.conllu","result.conllu"],encoding="utf-8",stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
with open("result.txt","w",encoding="utf-8") as w:
print(f"\n*** {mdl}",p.stdout,sep="\n",file=w)
os.system(f"mkdir -p {mdl} ; mv result.conllu result.txt {mdl}")
!cat `find . -name result.txt`
私(安岡孝一)の手元では、以下の結果が出力された。
*** KoichiYasuoka/roberta-base-chinese-ud-goeswith
Metric | Precision | Recall | F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens | 96.90 | 96.99 | 96.94 |
Sentences | 100.00 | 100.00 | 100.00 |
Words | 96.90 | 96.99 | 96.94 |
UPOS | 89.10 | 89.18 | 89.14 | 91.95
XPOS | 0.00 | 0.00 | 0.00 | 0.00
UFeats | 95.74 | 95.83 | 95.79 | 98.81
AllTags | 0.00 | 0.00 | 0.00 | 0.00
Lemmas | 96.15 | 96.24 | 96.19 | 99.23
UAS | 74.91 | 74.98 | 74.95 | 77.31
LAS | 70.95 | 71.01 | 70.98 | 73.22
CLAS | 74.08 | 75.03 | 74.55 | 78.20
MLAS | 62.85 | 63.66 | 63.25 | 66.35
BLEX | 73.55 | 74.49 | 74.02 | 77.63
*** KoichiYasuoka/deberta-base-chinese-ud-goeswith
Metric | Precision | Recall | F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens | 97.53 | 97.59 | 97.56 |
Sentences | 100.00 | 100.00 | 100.00 |
Words | 97.53 | 97.59 | 97.56 |
UPOS | 90.20 | 90.26 | 90.23 | 92.48
XPOS | 0.00 | 0.00 | 0.00 | 0.00
UFeats | 96.67 | 96.74 | 96.70 | 99.12
AllTags | 0.00 | 0.00 | 0.00 | 0.00
Lemmas | 96.78 | 96.84 | 96.81 | 99.23
UAS | 76.14 | 76.19 | 76.17 | 78.07
LAS | 72.22 | 72.27 | 72.25 | 74.05
CLAS | 75.35 | 76.42 | 75.88 | 78.99
MLAS | 65.04 | 65.97 | 65.51 | 68.19
BLEX | 74.81 | 75.88 | 75.34 | 78.43
*** KoichiYasuoka/deberta-base-chinese-erlangshen-ud-goeswith
Metric | Precision | Recall | F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens | 97.50 | 97.49 | 97.50 |
Sentences | 100.00 | 100.00 | 100.00 |
Words | 97.50 | 97.49 | 97.50 |
UPOS | 90.12 | 90.11 | 90.11 | 92.43
XPOS | 0.00 | 0.00 | 0.00 | 0.00
UFeats | 96.79 | 96.79 | 96.79 | 99.27
AllTags | 0.00 | 0.00 | 0.00 | 0.00
Lemmas | 96.75 | 96.74 | 96.75 | 99.23
UAS | 77.38 | 77.37 | 77.38 | 79.36
LAS | 73.54 | 73.53 | 73.54 | 75.42
CLAS | 77.00 | 77.97 | 77.48 | 80.70
MLAS | 66.31 | 67.15 | 66.73 | 69.50
BLEX | 76.40 | 77.37 | 76.88 | 80.08
*** KoichiYasuoka/deberta-large-chinese-erlangshen-ud-goeswith
Metric | Precision | Recall | F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens | 96.48 | 96.67 | 96.57 |
Sentences | 100.00 | 100.00 | 100.00 |
Words | 96.48 | 96.67 | 96.57 |
UPOS | 88.44 | 88.62 | 88.53 | 91.67
XPOS | 0.00 | 0.00 | 0.00 | 0.00
UFeats | 95.59 | 95.78 | 95.68 | 99.08
AllTags | 0.00 | 0.00 | 0.00 | 0.00
Lemmas | 95.75 | 95.94 | 95.84 | 99.24
UAS | 74.40 | 74.55 | 74.48 | 77.12
LAS | 70.45 | 70.59 | 70.52 | 73.02
CLAS | 73.00 | 74.17 | 73.58 | 77.74
MLAS | 62.73 | 63.74 | 63.23 | 66.80
BLEX | 72.42 | 73.58 | 72.99 | 77.12
*** KoichiYasuoka/deberta-xlarge-chinese-erlangshen-ud-goeswith
Metric | Precision | Recall | F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens | 96.07 | 96.22 | 96.14 |
Sentences | 100.00 | 100.00 | 100.00 |
Words | 96.07 | 96.22 | 96.14 |
UPOS | 88.11 | 88.25 | 88.18 | 91.71
XPOS | 0.00 | 0.00 | 0.00 | 0.00
UFeats | 95.13 | 95.28 | 95.20 | 99.02
AllTags | 0.00 | 0.00 | 0.00 | 0.00
Lemmas | 95.35 | 95.50 | 95.43 | 99.26
UAS | 73.90 | 74.02 | 73.96 | 76.93
LAS | 69.80 | 69.91 | 69.86 | 72.66
CLAS | 72.31 | 73.36 | 72.83 | 77.38
MLAS | 61.93 | 62.82 | 62.37 | 66.26
BLEX | 71.73 | 72.77 | 72.25 | 76.76
ざっと見たところ、deberta-base-chinese-erlangshen-ud-goeswithのLAS/MLAS/BLEXが73.54/66.73/76.88で、この中では最も良さそうだ。そこで、出力結果のファイル名をud/gsd/test.conllu
に変えてzipで固め、NLPre-ZHに登録してみた。そうしたところ、NLPre-ZHの「Leaderboard - UD Tagset」では、なぜかLAS/MLAS/BLEXが83.05/78.04/81.47になってしまった。いやいや、そんな高いはず無いんだけど、もしかしたらNLPre-ZHって、いまだにUniversal Dependencies 2.9使ってるのかしら。