XunziALLMトークナイザの「精度」をUD_Classical_Chinese-Kyotoのtestセットで測る

Posted at 2024-08-27

6月10日の記事の続きだが、古典中国語(漢文)生成モデルXunziALLMのうち、Xunzi-Qwen1.5-4B・Xunzi-Qwen1.5-7B・Xunzi-Qwen1.5-14B・Xunzi-Qwen2-1.5B・Xunzi-Qwen2-7Bのトークナイザ「精度」を、UD_Classical_Chinese-Kyotoのlzh_kyoto-ud-test.conlluで測ってみた。Google Colaboratoryだと、こんな感じ。

!pip install transformers modelscope sentencepiece spacy-alignments
models=["Xunzillm4cc/Xunzi-Qwen1.5-4B","Xunzillm4cc/Xunzi-Qwen1.5-7B","Xunzillm4cc/Xunzi-Qwen1.5-14B","Xunzillm4cc/Xunzi-Qwen2-1.5B","Xunzillm4cc/Xunzi-Qwen2-7B"]
ud="UD_Classical_Chinese-Kyoto"
!test -d $ud || git clone --depth=1 https://github.com/universaldependencies/$ud
!cp $ud/*-test.conllu test.conllu
from modelscope import AutoTokenizer
from spacy_alignments import get_alignments
for mdl in models:
  tkz=AutoTokenizer.from_pretrained(mdl,revision="master")
  gold=system=correct=0
  with open("test.conllu","r",encoding="utf-8") as r:
    for k in r:
      if k.startswith("# text ="):
        txt=k[8:].strip()
        frm=[]
      elif k.strip()=="":
        g=[(t[0],t[-1]+1) for t in get_alignments(list(txt),frm)[1]]
        s=[t for t in tkz(txt,return_offsets_mapping=True)["offset_mapping"] if t[0]<t[1]]
        gold+=len(g)
        system+=len(s)
        i=j=0
        while i<len(g) and j<len(s):
          if s[j][0]<g[i][0]:
            j+=1
          elif g[i][0]<s[j][0]:
            i+=1
          else:
            correct+=g[i][1]==s[j][1]
            i+=1
            j+=1
      else:
        t=k.split("\t")
        if len(t)==10 and t[0].isdecimal():
          frm.append(t[1])
  print("\n***",mdl)
  print("Precision",correct/system if system else 0.0)
  print("Recall   ",correct/gold)
  print("F1 Score ",2*correct/(system+gold))

私(安岡孝一)の手元では、以下の結果が出力された。

*** Xunzillm4cc/Xunzi-Qwen1.5-4B
Precision 0.8602375465343024
Recall    0.88017848073714
F1 Score  0.8700937763353714

*** Xunzillm4cc/Xunzi-Qwen1.5-7B
Precision 0.8602375465343024
Recall    0.88017848073714
F1 Score  0.8700937763353714

*** Xunzillm4cc/Xunzi-Qwen1.5-14B
Precision 0.8602375465343024
Recall    0.88017848073714
F1 Score  0.8700937763353714

*** Xunzillm4cc/Xunzi-Qwen2-1.5B
Precision 0.8602375465343024
Recall    0.88017848073714
F1 Score  0.8700937763353714

*** Xunzillm4cc/Xunzi-Qwen2-7B
Precision 0.8602375465343024
Recall    0.88017848073714
F1 Score  0.8700937763353714

どうやら、全て同じトークナイザが使われているようだ。Recallが88.02だと、トークナイザとしては「もう一声」なので、『GPT系モデルの系列ラベリングによる品詞付与』ではトークナイザの単文字化に挑戦している。でも、どうせなら、最初から単文字トークナイザで設計してくれた方がいいんだけどなあ。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up