0
0

タイ語係り受け解析モデルの「精度」をThai Universal Dependency Treebankのtestセットで測る

Posted at

昨日の記事の続きだが、Panyut Sriwirote, Wei Qi Leong, Charin Polpanumas, Santhawat Thanyawong, William Chandra Tjhi, Wirote Aroonmanakun, Attapol T. Rutherford『Thai Universal Dependency Treebank』のtest.conlluで、タイ語係り受け解析esuparモデルの「精度」を測ってみた。Google Colaboratoryだと、こんな感じ。

!pip install esupar
models=["KoichiYasuoka/camembert-thai-base-upos","KoichiYasuoka/deberta-base-thai-upos","KoichiYasuoka/roberta-base-thai-spm-upos","KoichiYasuoka/roberta-base-thai-syllable-upos","KoichiYasuoka/roberta-base-thai-char-upos","KoichiYasuoka/bert-base-thai-upos"]
import os,sys,subprocess,esupar
url="https://github.com/nlp-chula/TUD"
f=os.path.join(os.path.basename(url),"TUD","test.conllu")
os.system(f"test -f {f} || git clone --depth=1 {url}")
url="https://universaldependencies.org/conll18/conll18_ud_eval.py"
c=os.path.basename(url)
os.system(f"test -f {c} || curl -LO {url}")
with open(f,"r",encoding="utf-8") as r:
  s=r.read()
for mdl in models:
  nlp=esupar.load(mdl)
  with open("result.conllu","w",encoding="utf-8") as w:
    for t in s.split("\n"):
      if t.startswith("# text = "):
        print(nlp(t[9:]),file=w)
  p=subprocess.run([sys.executable,c,"-v",f,"result.conllu"],encoding="utf-8",stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
  print(f"\n*** {mdl}",p.stdout,sep="\n",flush=True)

私(安岡孝一)の手元では、以下の結果が出力された。

*** KoichiYasuoka/camembert-thai-base-upos
Metric     | Precision |    Recall |  F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens     |     92.58 |     89.80 |     91.17 |
Sentences  |    100.00 |    100.00 |    100.00 |
Words      |     92.58 |     89.80 |     91.17 |
UPOS       |     80.92 |     78.48 |     79.68 |     87.40
XPOS       |     92.58 |     89.80 |     91.17 |    100.00
UFeats     |     89.18 |     86.50 |     87.82 |     96.33
AllTags    |     77.76 |     75.43 |     76.58 |     84.00
Lemmas     |     92.58 |     89.80 |     91.17 |    100.00
UAS        |     74.56 |     72.32 |     73.42 |     80.53
LAS        |     63.53 |     61.62 |     62.56 |     68.62
CLAS       |     60.47 |     56.63 |     58.49 |     64.39
MLAS       |     48.82 |     45.73 |     47.22 |     51.99
BLEX       |     60.47 |     56.63 |     58.49 |     64.39

*** KoichiYasuoka/deberta-base-thai-upos
Metric     | Precision |    Recall |  F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens     |     91.07 |     88.81 |     89.92 |
Sentences  |    100.00 |    100.00 |    100.00 |
Words      |     91.07 |     88.81 |     89.92 |
UPOS       |     75.80 |     73.92 |     74.85 |     83.23
XPOS       |     91.07 |     88.81 |     89.92 |    100.00
UFeats     |     87.75 |     85.57 |     86.64 |     96.35
AllTags    |     72.70 |     70.90 |     71.79 |     79.83
Lemmas     |     91.07 |     88.81 |     89.92 |    100.00
UAS        |     71.06 |     69.30 |     70.17 |     78.03
LAS        |     60.16 |     58.66 |     59.40 |     66.06
CLAS       |     56.75 |     53.51 |     55.08 |     61.86
MLAS       |     43.62 |     41.13 |     42.34 |     47.54
BLEX       |     56.75 |     53.51 |     55.08 |     61.86

*** KoichiYasuoka/roberta-base-thai-spm-upos
Metric     | Precision |    Recall |  F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens     |     90.69 |     88.49 |     89.58 |
Sentences  |    100.00 |    100.00 |    100.00 |
Words      |     90.69 |     88.49 |     89.58 |
UPOS       |     74.54 |     72.73 |     73.62 |     82.19
XPOS       |     90.69 |     88.49 |     89.58 |    100.00
UFeats     |     87.39 |     85.28 |     86.32 |     96.37
AllTags    |     71.51 |     69.78 |     70.63 |     78.85
Lemmas     |     90.69 |     88.49 |     89.58 |    100.00
UAS        |     69.88 |     68.19 |     69.03 |     77.06
LAS        |     59.44 |     58.00 |     58.71 |     65.54
CLAS       |     55.94 |     52.56 |     54.20 |     61.07
MLAS       |     41.79 |     39.26 |     40.48 |     45.62
BLEX       |     55.94 |     52.56 |     54.20 |     61.07

*** KoichiYasuoka/roberta-base-thai-syllable-upos
Metric     | Precision |    Recall |  F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens     |     89.01 |     86.80 |     87.89 |
Sentences  |    100.00 |    100.00 |    100.00 |
Words      |     89.01 |     86.80 |     87.89 |
UPOS       |     69.57 |     67.84 |     68.69 |     78.15
XPOS       |     89.01 |     86.80 |     87.89 |    100.00
UFeats     |     85.84 |     83.70 |     84.76 |     96.43
AllTags    |     66.70 |     65.04 |     65.86 |     74.93
Lemmas     |     89.01 |     86.80 |     87.89 |    100.00
UAS        |     67.16 |     65.50 |     66.32 |     75.45
LAS        |     57.23 |     55.81 |     56.51 |     64.30
CLAS       |     53.45 |     50.29 |     51.82 |     59.71
MLAS       |     37.08 |     34.89 |     35.95 |     41.42
BLEX       |     53.45 |     50.29 |     51.82 |     59.71

*** KoichiYasuoka/roberta-base-thai-char-upos
Metric     | Precision |    Recall |  F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens     |     88.44 |     86.65 |     87.53 |
Sentences  |    100.00 |    100.00 |    100.00 |
Words      |     88.44 |     86.65 |     87.53 |
UPOS       |     70.49 |     69.06 |     69.77 |     79.71
XPOS       |     88.44 |     86.65 |     87.53 |    100.00
UFeats     |     85.29 |     83.56 |     84.42 |     96.44
AllTags    |     67.66 |     66.29 |     66.97 |     76.51
Lemmas     |     88.44 |     86.65 |     87.53 |    100.00
UAS        |     66.52 |     65.17 |     65.84 |     75.21
LAS        |     56.33 |     55.19 |     55.75 |     63.69
CLAS       |     52.60 |     49.63 |     51.07 |     59.18
MLAS       |     37.81 |     35.67 |     36.71 |     42.54
BLEX       |     52.60 |     49.63 |     51.07 |     59.18

*** KoichiYasuoka/bert-base-thai-upos
Metric     | Precision |    Recall |  F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens     |     79.12 |     69.92 |     74.23 |
Sentences  |    100.00 |    100.00 |    100.00 |
Words      |     79.12 |     69.92 |     74.23 |
UPOS       |     51.69 |     45.69 |     48.50 |     65.34
XPOS       |     79.12 |     69.92 |     74.23 |    100.00
UFeats     |     76.13 |     67.28 |     71.43 |     96.22
AllTags    |     49.44 |     43.69 |     46.39 |     62.49
Lemmas     |     79.12 |     69.92 |     74.23 |    100.00
UAS        |     44.54 |     39.36 |     41.79 |     56.29
LAS        |     32.11 |     28.37 |     30.13 |     40.58
CLAS       |     26.43 |     22.72 |     24.43 |     34.66
MLAS       |     14.95 |     12.85 |     13.82 |     19.60
BLEX       |     26.43 |     22.72 |     24.43 |     34.66

camembert-thai-base-uposのLAS/MLAS/BLEXが62.56/47.22/58.49で、まだまだ改善の余地がありそうだ。さて、どういう方法があるかな。

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0