0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

Qwen3-VL-2B-Instructは漢文OCRとして使えるのか

Last updated at Posted at 2025-12-26

昨日の記事の手法をQwen3-VL-2B-Instructに適用して、漢文OCRとして使えるか試してみた。Google Colaboratory (GPU版)だと、こんな感じ。

!pip install transformers jinja2
img="http://kanji.zinbun.kyoto-u.ac.jp/db-machine/toho/L/B0010001.jpg"
from transformers import pipeline
mdl="Qwen/Qwen3-VL-2B-Instruct"
nlp=pipeline("image-text-to-text",mdl,max_new_tokens=2048)
d=nlp([{"role":"user","content":[{"type":"image","image":img},
  {"type":"text","text":"閱讀垂直文言文,逐行輸出。"}]}])
print(d[0]["generated_text"][1]["content"])

漢書零片』を読み込ませてみたところ、私(安岡孝一)の手元では以下の結果が出力された。

因章事舉直言極諫並見郎從官展盡其
意加於往前以明示四方使天下咸知主
上聖明不以言罪下也若此則流言消釋
疑惑著明鳳白行其策欽之補過將美皆
此類也師古曰優游不仕以壽終欽子及
昆弟支屬至二千石者且十人欽兄緩前
免太常以列矦奉朝請成帝時乃薨子業
嗣業有材能以列矦選復為太常數言
失不事權貴與丞相翟方進術定吏民
淳于長不平後業坐法免官復為函谷關
都尉會定陵矦長有罪當就國長舅紅
矦立與業書曰誠哀老姊垂白隨無杖子
出關師古曰垂白者白髮願勿復用前事相
侵定陵矦既出關伏罪復發後書也語在外
傳下雒陽獄丞相史搜得紅陽矦書奏業
聽請不咎服虔曰受立屬請爲不折
方進薨業上書言方進本與長深結厚夏
相稱薦師古曰更長陷大惡獨得不坐苟
音工衡反

夾註がうまく読めてなくて、たとえば「師古曰更音工衡反」が泣き別れになっている。その点を除けば、だいたい読めているみたいだが、やっぱり夾註は難しいのかな。

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?