0
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

Tesseractを使って日本語OCRのメモ

Last updated at Posted at 2019-08-22

####ダウンロード
https://github.com/UB-Mannheim/tesseract/wiki
 ⇒tesseract-ocr-w64-setup-v5.0.0-alpha.20190708.exe

####インストール
Additional LanguageでJapanese関連をチェックし、次へ次へで完了

####環境変数の追加
PATHに以下を追加
C:\Program Files\Tesseract-OCR

####確認
tesseract -v

tesseract v5.0.0-alpha.20190708
leptonica-1.78.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Found AVX2
Found AVX
Found SSE
Found libarchive 3.3.2 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5

####対応言語の確認
tesseract --list-langs

List of available languages (7):
eng
jav
jpn
jpn_vert
osd
script/Japanese
script/Japanese_vert

####PyOcr
pip install pyocr

####実行してみる

from PIL import Image
import sys
import pyocr
import pyocr.builders

tools = pyocr.get_available_tools()
if len(tools) == 0:
    print("No OCR tool found")
    sys.exit(1)

tool = tools[0]

#言語、オプションの指定をする
txt = tool.image_to_string( 
    Image.open('test.jpg'),
    lang='jpn',
    builder=pyocr.builders.TextBuilder()
)
print(txt)
0
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?