画像から文字を抽出

Posted at 2024-06-27

今回はこちらの画像から文字の抽出を行う

Pillowとは

広範な画像処理機能を提供する Python Imaging Library (PIL) フォーク。
画像ファイル形式を開いたり、操作したり、保存したりするためのライブラリ

pip install pillow

pytesseractとは

Python 用の光学文字認識 (OCR) ツールです。つまり、画像に埋め込まれたテキストを認識して「読み取り」ます。

pip install pytesseract

スクリプト

import pytesseract
from PIL import Image

# 画像ファイルを開く
image = Image.open('image.png')

# PyTesseractを使って文字列を抽出する
text = pytesseract.image_to_string(image)
print(text)

参考サイト

また、下記のサイトでは精度を上げることもできるみたい

最後に

全ての画像から文字を抽出できるわけではないみたいなので、そこ考慮して使う必要がある

今回は、業務で知ったライブラリをただ試してみたかった。
pythonは出来ることが多く、まだまだ知らないことばかりだが、
少しずつ知っていけてる今が楽しい♪

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up