PaddleOCR-VL-1.6-Ainuリリース

Posted at 2026-05-29

PaddleOCR-VL-1.6がリリースされたので、5月10日の記事と同様にカタカナ書きのアイヌ語で追加学習して、PaddleOCR-VL-1.6-Ainuを試作してみた。『Qwen3.5によるカタカナアイヌ語OCRの開発』の表2と比較してみよう。Google Colaboratoryだと、こんな感じ。

!pip install 'transformers>=5.9.0' accelerate jinja2
from transformers import pipeline
from transformers.utils import cached_file
import re
models=["PaddlePaddle/PaddleOCR-VL-1.5","KoichiYasuoka/PaddleOCR-VL-1.5-Ainu","PaddlePaddle/PaddleOCR-VL-1.6","KoichiYasuoka/PaddleOCR-VL-1.6-Ainu"]
img=cached_file("KoichiYasuoka/Qwen3.5-2B-AinuOCR","uwerankarap/orig/2026-01-31.jpg")
txt=cached_file("KoichiYasuoka/Qwen3.5-2B-AinuOCR","uwerankarap/orig/2026-01-31.txt")
with open(txt,"r",encoding="utf-8") as r:
  a=re.sub("[ \n　]+"," ",r.read()).replace("ㇷ゚","\ue1f7").strip()
v={"（":"(","）":")"}
for mdl in models:
  nlp=pipeline("image-text-to-text",mdl,max_new_tokens=2048)
  doc=nlp([{"role":"user","content":[{"type":"image","image":img},{"type":"text","text":"OCR"}]}])
  b=re.sub("[ \n　]+"," ",doc[0]["generated_text"][1]["content"]).replace("ㇷ゚","\ue1f7").strip()
  d={i:{-1:i+1} for i in range(len(a))}
  d[-1]={j-1:j for j in range(len(b)+1)}
  for i in range(len(a)):
    for j in range(len(b)):
      c=0 if v.get(a[i],a[i])==v.get(b[j],b[j]) else 1
      w=d[i][j]=min(d[i-1][j]+1,d[i][j-1]+1,d[i-1][j-1]+c)
  print("\n***",mdl)
  print("Левенштейн",w)
  print("error rate",w/len(a))

私(安岡孝一)の手元では、以下のレーベンシュタイン距離と単純誤り率が得られた。

*** PaddlePaddle/PaddleOCR-VL-1.5
Левенштейн 288
error rate 0.25645592163846836

*** KoichiYasuoka/PaddleOCR-VL-1.5-Ainu
Левенштейн 51
error rate 0.04541406945681211

*** PaddlePaddle/PaddleOCR-VL-1.6
Левенштейн 112
error rate 0.09973285841495994

*** KoichiYasuoka/PaddleOCR-VL-1.6-Ainu
Левенштейн 62
error rate 0.05520926090828139

この結果を見る限り、PaddleOCR-VL-1.5-Ainuがレーベンシュタイン距離51(単純誤り率4.5%)でダントツである。うーん、PaddleOCR-VL-1.6-Ainuの方が改善されそうだと思ったのだが、なかなか難しいなあ。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up