0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

PaddleOCR-VL-1.6-Ainuリリース

0
Posted at

PaddleOCR-VL-1.6がリリースされたので、5月10日の記事と同様にカタカナ書きのアイヌ語で追加学習して、PaddleOCR-VL-1.6-Ainuを試作してみた。『Qwen3.5によるカタカナアイヌ語OCRの開発』の表2と比較してみよう。Google Colaboratoryだと、こんな感じ。

!pip install 'transformers>=5.9.0' accelerate jinja2
from transformers import pipeline
from transformers.utils import cached_file
import re
models=["PaddlePaddle/PaddleOCR-VL-1.5","KoichiYasuoka/PaddleOCR-VL-1.5-Ainu","PaddlePaddle/PaddleOCR-VL-1.6","KoichiYasuoka/PaddleOCR-VL-1.6-Ainu"]
img=cached_file("KoichiYasuoka/Qwen3.5-2B-AinuOCR","uwerankarap/orig/2026-01-31.jpg")
txt=cached_file("KoichiYasuoka/Qwen3.5-2B-AinuOCR","uwerankarap/orig/2026-01-31.txt")
with open(txt,"r",encoding="utf-8") as r:
  a=re.sub("[ \n ]+"," ",r.read()).replace("ㇷ゚","\ue1f7").strip()
v={"(":"(",")":")"}
for mdl in models:
  nlp=pipeline("image-text-to-text",mdl,max_new_tokens=2048)
  doc=nlp([{"role":"user","content":[{"type":"image","image":img},{"type":"text","text":"OCR"}]}])
  b=re.sub("[ \n ]+"," ",doc[0]["generated_text"][1]["content"]).replace("ㇷ゚","\ue1f7").strip()
  d={i:{-1:i+1} for i in range(len(a))}
  d[-1]={j-1:j for j in range(len(b)+1)}
  for i in range(len(a)):
    for j in range(len(b)):
      c=0 if v.get(a[i],a[i])==v.get(b[j],b[j]) else 1
      w=d[i][j]=min(d[i-1][j]+1,d[i][j-1]+1,d[i-1][j-1]+c)
  print("\n***",mdl)
  print("Левенштейн",w)
  print("error rate",w/len(a))

私(安岡孝一)の手元では、以下のレーベンシュタイン距離と単純誤り率が得られた。

*** PaddlePaddle/PaddleOCR-VL-1.5
Левенштейн 288
error rate 0.25645592163846836

*** KoichiYasuoka/PaddleOCR-VL-1.5-Ainu
Левенштейн 51
error rate 0.04541406945681211

*** PaddlePaddle/PaddleOCR-VL-1.6
Левенштейн 112
error rate 0.09973285841495994

*** KoichiYasuoka/PaddleOCR-VL-1.6-Ainu
Левенштейн 62
error rate 0.05520926090828139

この結果を見る限り、PaddleOCR-VL-1.5-Ainuがレーベンシュタイン距離51(単純誤り率4.5%)でダントツである。うーん、PaddleOCR-VL-1.6-Ainuの方が改善されそうだと思ったのだが、なかなか難しいなあ。

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?