0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

古典中国語係り受け解析モデルXunzi-Qwen2-1.5B-ud-causalリリース

Posted at

XunziALLMの古典中国語(漢文)生成モデルXunzi-Qwen2-1.5Bをもとに、UD_Classical_Chinese-KyotoとQwen2ForTokenClassificationを使って、古典中国語係り受け解析モデルXunzi-Qwen2-1.5B-ud-causalを作ってみた。昨日の記事のベンチマーク・プログラムを、Google Colaboratory (GPU版)でlzh_kyoto-ud-test.conllu用に改造すると、こんな感じ。

!pip install transformers
models=["KoichiYasuoka/Xunzi-Qwen2-1.5B-ud-causal"]
import os,sys,subprocess
from transformers import pipeline
url="https://github.com/UniversalDependencies/UD_Classical_Chinese-Kyoto"
d=os.path.basename(url)
!test -d {d} || git clone --depth=1 {url}
!for F in train dev test ; do cp {d}/*-$$F.conllu $$F.conllu ; done
url="https://universaldependencies.org/conll18/conll18_ud_eval.py"
c=os.path.basename(url)
!test -f {c} || curl -LO {url}
for mdl in models:
  nlp=pipeline("universal-dependencies",mdl,trust_remote_code=True,device=0)
  with open("test.conllu","r",encoding="utf-8") as r:
    s=r.read()
  with open("result-test.conllu","w",encoding="utf-8") as w:
    for t in s.split("\n"):
      if t.startswith("# text = "):
        w.write(nlp(t[9:]))
  print("\n***",mdl)
  print(subprocess.run([sys.executable,c,"--verbose","test.conllu","result-test.conllu"],capture_output=True,text=True).stdout)

私(安岡孝一)の手元では、以下の結果が出力された。

*** KoichiYasuoka/Xunzi-Qwen2-1.5B-ud-causal
Metric     | Precision |    Recall |  F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens     |     97.00 |     97.98 |     97.49 |
Sentences  |    100.00 |    100.00 |    100.00 |
Words      |     97.00 |     97.98 |     97.49 |
UPOS       |     87.91 |     88.79 |     88.35 |     90.62
XPOS       |      0.00 |      0.00 |      0.00 |      0.00
UFeats     |     90.21 |     91.12 |     90.66 |     93.00
AllTags    |      0.00 |      0.00 |      0.00 |      0.00
Lemmas     |     95.31 |     96.27 |     95.78 |     98.25
UAS        |     74.07 |     74.81 |     74.44 |     76.35
LAS        |     68.78 |     69.47 |     69.12 |     70.90
CLAS       |     67.84 |     68.52 |     68.18 |     70.07
MLAS       |     64.74 |     65.39 |     65.07 |     66.88
BLEX       |     67.08 |     67.75 |     67.41 |     69.29

LAS/MLAS/BLEXが69.12/65.07/67.41、うーん、残念。XunziALLMシリーズは、トークナイザの出来がイマイチなので、やっぱり係り受け解析には向いてないのかな。

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?