Tesseract OCR を使って画面領域からテキストを読み取ろう！

Last updated at 2025-08-21Posted at 2025-08-20

口上

Snipping Tools的な動きで文字をOCR出来るアプリを作ろう！　と思った御坂でした。
さて、そのきっかけとしては以下のような作業の流れがあります。

ブラウザに表示された特定の画面をスクリーンキャプチャする（このタイミングでしか取れない情報が入っている）
作業完了後、キャプチャした画像内から数か所の特定の文字列をテキストデータとして拾う必要がある（20文字*5箇所とか）
"（ファイル名）.mhtml"とかで取れない（厳密には取れるけどそれじゃ面白くない）
ブラウザで表示されてる値をテキストファイルで保存しない（保存できるけどそれじゃ面白くない）

この状況で、一番スムーズで今後も汎用性があるのって画像の中のテキストを文字通りOCRしてテキストデータにする方法だよね？　と相成ったわけですね。

Tesseract OCR × OpenCV で文字認識精度が（多分）向上する一つの方法という記事にて今回のソースの詳細を解説しています。そちらも良ければご覧になってください。

必要な機能

これは一言で済んだりします。「表示されている画像内のテキストを順たるアスキーデータにする。
です」ということで作って参りましょう。

ライブラリを読み込む

今回のアプリで必要なのは次の4つとなります。バージョンに関してはとりあえず最新版を入れてあります。

Tesseract ver.5.2.0
OpenCvSharp4 ver.4.11.0.20250507
OpenCvSharp4.Extensions ver.4.11.0.20250507
OpenCvSharp4.runtime.win ver.4.11.0.20250507

一つ目の Tesseract がOCRの本体で良い感じに動いてくれるやつです。
二つ目以降の OpenCvSharp4 は画像をなんかいろんな処理してくれるライブラリ、って考えれば合点がいくかな？　と。

追加のファイルを読み込む

良い感じに OCR として動いてくれる Tesseract ですが、これだけだと C# でエラーが出ます。
なんでエラーが出るかというと、NuGetからこれを導入しただけだと所謂「辞書ファイル」が存在していないんです。なのでこの辞書ファイルを github のTesseract OCRから手動でダウンロードしてきます。
ちなみに今回は英語特化ということで[ eng.traineddata ]をダウンロードだけです。これに日本語とかもOCRするなら[ jpn.traineddata ]とかも、といった感じで必要な元言語のファイルを準備します。

C:\Users\tempuser\source\repos\textSnipping\bin\Debug\net8.0-windows\tessdata>dir
 ドライブ C のボリューム ラベルは Windows です
 ボリューム シリアル番号は xxxx-xxxx です

 C:\Users\tempuser\source\repos\textSnipping\bin\Debug\net8.0-windows\tessdata のディレクトリ

2025/08/18  13:50    <DIR>          .
2025/08/20  13:07    <DIR>          ..
2025/08/18  13:44        23,466,654 eng.traineddata
2025/08/18  13:48        35,659,159 jpn.traineddata
               2 個のファイル          59,125,813 バイト
               2 個のディレクトリ  99,752,140,800 バイトの空き領域

で、ひとまずデバック段階なのでソースフォルダに入れてあります。実際の運用ビルド段階になったらソース側（後述）で指定しているフォルダに配置します。

じゃあ作っていきましょう（まずはデザインを。）

さて、必要なファイルの準備は終わったので実際にソースを書いていきます。の前に！　Form1.csのデザインは画像で表すとこんな感じです。
最低限のものだけを配置してます。

上下に一つずつの tableLayoutPanel オブジェクト。
上部は tableLayoutPanel1 でボタンが二つとチェックボックス、下部は tableLayoutPanel2 でtextBoxが3つという構成です。
ちなみにソース側とデザイナでうまいこと可変になるように設定しています。

次はソースを書いていきます。

ではソースです。

Form1.cs

using System.Text.RegularExpressions;
using OpenCvSharp;
using OpenCvSharp.Extensions;
using Tesseract;

namespace textSnipping
{

    public partial class Form1 : Form
    {

        public Form1()
        {

            InitializeComponent();
            this.MinimumSize = new System.Drawing.Size(900, 350);
            checkBox1.Visible = false;

        }

        private void Form1_Load(object sender, EventArgs e)
        {

            // 最小サイズを指定
            this.MinimumSize = new System.Drawing.Size(900, 350);

            // 上のパネル
            tableLayoutPanel1.Dock = DockStyle.Top;
            tableLayoutPanel1.MinimumSize = new System.Drawing.Size(0, 40);
            tableLayoutPanel1.Height = 40;
            tableLayoutPanel1.Margin = new Padding(0);

            // 下のパネル
            tableLayoutPanel2.Dock = DockStyle.Fill;
            tableLayoutPanel2.Margin = new Padding(0);
            tableLayoutPanel2.RowStyles.Clear();
            tableLayoutPanel2.RowStyles.Add(new RowStyle(SizeType.Percent, 100F));

            // 追加順序
            this.Controls.Add(tableLayoutPanel2);   // 下
            this.Controls.Add(tableLayoutPanel1);   // 上

        }

        private void button1_Click(object sender, EventArgs e)
        {

            ProcessScreenCapture(invert: false);

        }

        private void button2_Click(object sender, EventArgs e)
        {

            ProcessScreenCapture(invert: true);

        }

        private void ProcessScreenCapture(bool invert)
        {

            using (var selector = new ScreenCaptureSelector())
            {

                if (selector.ShowDialog() == DialogResult.OK)
                {

                    Rectangle rect = selector.SelectedRectangle;

                    // 幅または高さが0ならスキップ
                    if (rect.Width == 0 || rect.Height == 0) return;

                    Bitmap bmp = new Bitmap(rect.Width, rect.Height);
                    using (Graphics g = Graphics.FromImage(bmp))
                        g.CopyFromScreen(rect.Location, System.Drawing.Point.Empty, rect.Size);

                    Bitmap grayResized = PreprocessImage(bmp, scale: 5);
                    if (invert) grayResized = InvertBitmap(grayResized);
                    Bitmap processed = OpenCvPreprocess(grayResized);

                    string[] results = new string[3]; // 3回分の結果格納
                    string tessPath = @".\tessdata";

                    using (var engine = new TesseractEngine(tessPath, "eng", EngineMode.Default))
                    {

                        engine.SetVariable("tessedit_char_whitelist", "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.,!?\"' ");
                        for (int i = 0; i < 3; i++)
                        {

                            using (var ms = new MemoryStream())
                            {

                                processed.Save(ms, System.Drawing.Imaging.ImageFormat.Bmp);
                                ms.Position = 0;

                                using (var pix = Pix.LoadFromMemory(ms.ToArray()))
                                using (var page = engine.Process(pix, PageSegMode.Auto))
                                {

                                    string text = page.GetText();
                                    text = Regex.Replace(text, @"[^\S\r\n]+", " ").Trim();
                                    text = text.Replace("\n", "\r\n");

                                    // AI補正
                                    text = AiCorrectEnglishText(text);

                                    results[i] = text;

                                }

                            }

                        }

                    }

                    // 各TextBoxに結果を表示
                    textBox1.Text = results[0];
                    textBox2.Text = results[1];
                    textBox3.Text = results[2];
                    textBox3.SelectionStart = textBox3.Text.Length;
                    textBox3.ScrollToCaret();

                }

            }

        }

        private string AiCorrectEnglishText(string raw)
        {

            string text = raw
                .Replace("0", "O")
                .Replace("1", "I")
                .Replace("l", "I")
                .Replace("|", "I")
                .Replace("@", "a");

            // 連続大文字の簡易補正
            text = Regex.Replace(text, @"([A-Z]{2,})", m =>
            {

                string s = m.Value;
                return s;

            });

            return text;

        }

        private Bitmap InvertBitmap(Bitmap bmp)
        {

            Bitmap inverted = new Bitmap(bmp.Width, bmp.Height);
            for (int y = 0; y < bmp.Height; y++)
                for (int x = 0; x < bmp.Width; x++)
                {

                    Color px = bmp.GetPixel(x, y);
                    inverted.SetPixel(x, y, Color.FromArgb(255 - px.R, 255 - px.G, 255 - px.B));

                }

            return inverted;

        }

        private Bitmap PreprocessImage(Bitmap bmp, int scale = 3)
        {

            Bitmap grayBmp = new Bitmap(bmp.Width, bmp.Height);
            using (Graphics g = Graphics.FromImage(grayBmp))
            {

                var cm = new System.Drawing.Imaging.ColorMatrix(new float[][]
                {

                    new float[]{0.3f,0.3f,0.3f,0,0},
                    new float[]{0.59f,0.59f,0.59f,0,0},
                    new float[]{0.11f,0.11f,0.11f,0,0},
                    new float[]{0,0,0,1,0},
                    new float[]{0,0,0,0,1}

                });
                var ia = new System.Drawing.Imaging.ImageAttributes();
                ia.SetColorMatrix(cm);
                g.DrawImage(bmp, new Rectangle(0, 0, bmp.Width, bmp.Height),
                    0, 0, bmp.Width, bmp.Height, GraphicsUnit.Pixel, ia);

            }

            return new Bitmap(grayBmp, new System.Drawing.Size(grayBmp.Width * scale, grayBmp.Height * scale));

        }

        private Bitmap OpenCvPreprocess(Bitmap bmp)
        {

            using (Mat src = BitmapConverter.ToMat(bmp))
            using (Mat gray = new Mat())
            using (Mat bin = new Mat())
            {

                Cv2.CvtColor(src, gray, ColorConversionCodes.BGR2GRAY);
                int blockSize = 21; if (blockSize % 2 == 0) blockSize += 1;
                Cv2.AdaptiveThreshold(gray, bin, 255,
                    AdaptiveThresholdTypes.GaussianC,
                    ThresholdTypes.Binary,
                    blockSize, 3);
                Mat kernel = Cv2.GetStructuringElement(MorphShapes.Rect, new OpenCvSharp.Size(4, 3));
                Cv2.MorphologyEx(bin, bin, MorphTypes.Dilate, kernel);
                return BitmapConverter.ToBitmap(bin);

            }

        }

    }

}

基本的な流れとしては、本家の Snipping Tools 宜しくにフルスクリーンで半透明のレイヤーを出し、そのレイヤー上で範囲選択をしたエリア内をいろいろ画像処理してOCRしている、という形になります。
じゃあ、そのレイヤーを表示する側はどんなになってるの？　っていうと Form2.cs で処理しているのではなく完全にソースでいろいろ書いたりしています。それがこちら。

ScreenCaptureSelector.cs

using System.Runtime.InteropServices;

public class ScreenCaptureSelector : Form
{

    private Point startPoint;
    private Rectangle selectionRect;
    private bool isSelecting = false;

    public Rectangle SelectedRectangle { get; private set; }

    public ScreenCaptureSelector()
    {

        this.FormBorderStyle = FormBorderStyle.None;
        this.WindowState = FormWindowState.Maximized;
        this.TopMost = true;
        this.DoubleBuffered = true;
        this.BackColor = Color.Black;
        this.Opacity = 0.3;
        this.Cursor = Cursors.Cross;

        // タスクバー操作でも対象ウィンドウを前面に持ってこれる
        this.Load += (s, e) =>
        {

            int exStyle = (int)GetWindowLong(this.Handle, GWL_EXSTYLE);
            exStyle &= ~WS_EX_NOACTIVATE; // クリックでアクティブ化可能
            SetWindowLong(this.Handle, GWL_EXSTYLE, (IntPtr)exStyle);

        };

    }

    protected override void OnMouseDown(MouseEventArgs e)
    {

        isSelecting = true;
        startPoint = e.Location;
        selectionRect = new Rectangle(e.Location, new Size(0, 0));
        Invalidate();

    }

    protected override void OnMouseMove(MouseEventArgs e)
    {

        if (isSelecting)
        {

            selectionRect = new Rectangle(
                Math.Min(startPoint.X, e.X),
                Math.Min(startPoint.Y, e.Y),
                Math.Abs(startPoint.X - e.X),
                Math.Abs(startPoint.Y - e.Y)
            );
            Invalidate();

        }

    }

    protected override void OnMouseUp(MouseEventArgs e)
    {
        isSelecting = false;
        SelectedRectangle = selectionRect;
        DialogResult = DialogResult.OK;
        Close();
    }

    protected override void OnPaint(PaintEventArgs e)
    {

        base.OnPaint(e);
        if (isSelecting)
        {

            using (Brush blackBrush = new SolidBrush(Color.FromArgb(100, Color.Black)))
                e.Graphics.FillRectangle(blackBrush, this.ClientRectangle);

            Region originalRegion = e.Graphics.Clip;
            e.Graphics.SetClip(selectionRect, System.Drawing.Drawing2D.CombineMode.Exclude);

            using (Brush brush = new SolidBrush(Color.FromArgb(0, Color.Black)))
                e.Graphics.FillRectangle(brush, this.ClientRectangle);

            e.Graphics.SetClip(originalRegion, System.Drawing.Drawing2D.CombineMode.Replace);

            using (Brush brush = new SolidBrush(Color.FromArgb(200, Color.White)))
                e.Graphics.FillRectangle(brush, selectionRect);

            using (Pen pen = new Pen(Color.FromArgb(255, 255, 0, 0), 2))
                e.Graphics.DrawRectangle(pen, selectionRect);

        }

    }

    // Win32 API
    const int GWL_EXSTYLE = -20;
    const int WS_EX_NOACTIVATE = 0x08000000;

    [DllImport("user32.dll")]
    static extern IntPtr GetWindowLong(IntPtr hWnd, int nIndex);

    [DllImport("user32.dll")]
    static extern IntPtr SetWindowLong(IntPtr hWnd, int nIndex, IntPtr dwNewLong);

}

ソースとしてはこれだけになります。
OpenAIさんに結構補完してもらったりしているので御坂自身も「？」ってなってる部分もありますが、自己責任で自分だけが使ってるレベルでなら問題無いかな？　と思う次第です。

実際の認識精度ってどうよ？

で、気になるのは実際の認識精度だと思うんですが、御坂がいろんなデータ（10文字ぐらいのものから200文字ぐらいのものまで）を試してみた結果は9割程度ですかね？
まあ正直、GoogleとかChatCPTのOCRレベルを想像しているならさすがに難しいですね。
ただ、本当に単純なテキストデータなら問題無いかな？　って個人的には思ってます。

今後は

これを書いてる最中に ocr.space っていう OCR 用のAPIを見つけたので今後はこれを組み込んでもう少し精度を上げたいなー　なんて思ってます。

余談

これを使って損害が出ても、当然ながら御坂は責任を負いかねます。

以上、御坂ようがお届けしました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up