Rubyで画像からテキストを認識する

Last updated at 2021-12-20Posted at 2021-12-19

はじめに

テキストでいっぱいの画像があり、わからなくても意味はまだ知りたいと想像してください。その時にどうすれば良いですか。翻訳ツールを利用すると思い付きますね。翻訳したいのにどのように書かれているのかわからないので、その際に画像からテキストを取得する方法が必要になります。
それで今日はRubyで画像から文字を認識するを紹介したいと思います。

実装

画像からテキストを認識するため、ImageMagickとTesseractを使います。
ImageMagick: Rubyに画像処理ツールです。

# インストール(MacOS)
brew install imagemagick

Tesseract: 光学式文字認識を使用してテキスト認識ツールです。現在、多くの言語をサポートしています。

# インストール(MacOS)
brew install tesseract
# training dataはどこで保存しているか確認するコマンド
brew list tesseract

training dataフォルダで画像からテキスト認識する言語の保存が必要です。今回は英語(eng.traineddata)を使います。他の言語を利用する場合はサポート言語を参考してください。

画像からテキストを認識する手順

仮に以下の画像からテキストを取得したいなら、

1. 画像を白黒に変換（グレースケール）

2段階と3段階に利用するために、まずは画像を白黒に変換するのが必要です。

require 'pathname'
require 'open3'
require 'mini_magick'

INPUT = '/Users/duong.nguyen/Desktop/input'.freeze    # 元の画像を保存するフォルダ
OUTPUT = '/Users/duong.nguyen/Desktop/output'.freeze  # 処理した画像を保存するフォルダ

Pathname.new(INPUT).children.each do |f|
  src_path = f.realpath
  tmp_path = "#{OUTPUT}/#{f.basename}"

  img = MiniMagick::Image.open(src_path)
  img.colorspace('Gray')
  img.write(tmp_path)
end

上記のコードを実行すると、元の画像から白黒にしました。
処理した画像：

2. 画像のクリーニング

不要なものを消すため画像を黒色にします。これは、Tesseractがテキストをもっとよく認識するのに役立ちます。

# ...
Pathname.new(INPUT).children.each do |f|
  # ...
  MiniMagick::Tool::Convert.new do |magick|
    magick << tmp_path
    magick.negate
    magick.threshold('007%')
    magick.negate
    magick << tmp_path
  end
end

処理した画像：

3. テキストの認識

tesseractを使用してテキストを認識し、open3を使用して返されたテキストをcaptureします。

# ...
Pathname.new(INPUT).children.each do |f|
  # ...
  text, _xxx, _yyy = Open3.capture3("tesseract #{tmp_path} stdout -l eng --oem 0 --psm 3")
  puts text.strip
end

認識したテキスト

We would all be
living there together.

全てのコード

require 'pathname'
require 'open3'
require 'mini_magick'

INPUT = '/Users/duong.nguyen/Desktop/input'.freeze
OUTPUT = '/Users/duong.nguyen/Desktop/output'.freeze
Pathname.new(INPUT).children.each do |f|
  # Step 1
  src_path = f.realpath
  tmp_path = "#{OUTPUT}/#{f.basename}"
  img = MiniMagick::Image.open(src_path)
  img.colorspace('Gray')
  img.write(tmp_path)
  # Step 2
  MiniMagick::Tool::Convert.new do |magick|
    magick << tmp_path
    magick.negate
    magick.threshold('007%')
    magick.negate
    magick << tmp_path
  end
  # Step 3
  text, _xxx, _yyy = Open3.capture3("tesseract #{tmp_path} stdout -l eng --oem 0 --psm 3")
  puts text.strip
end

終わりに

テキスト認識の正確的は画像の種類によるとか、画像のクリーニングが良いかどうかなどと思います。それで、いくつかのケースで上の方法で文字を認識できない可能性があるため、テキスト認識のレベルが完全に正確ではないですね。
上記は、Rubyで基本的で画像からテキストを認識する手順です。
画像関連の処理が必要な場合に役立つと願っています。

参考：https://aonemd.me/posts/extracting-text-from-images-using-ruby/

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up