More than 5 years have passed since last update.

Lazy advent calendar 2019

Last updated at 2019-12-25Posted at 2019-12-24

OCRの自己満足実装

Plans

個々の文字をまず認識する。(Using haar or YOLO?)
Rectangleで個々の文字を覆い、以下の処理を施す。
Rectangleで囲まれた領域において、輝度値を3D-plotしたものに対して、これを3D-modelingして、回転させた際の写像をもとに個々の文字を正確に認識させる。

# Modules
from pathlib import Path
from skimage import io
import matplotlib.pyplot as plt
import cv2
import numpy as np

# Putting some image files of any documents under dataset

p = Path("../dataset")
paths = list(p.glob("**/*.jpg"))
data1 = io.imread(paths[0])

# Using tile strategy to evaluate the recognition accuracy.

mini = data1[1200:1400, 800:1000, 0]
plt.imshow(mini)
print("Showing data to be processed...")
plt.show()

Switch1

紙全体の領域をAffine変換などで切り取った上で、一文字のpixel-widthを推定する。

# Highlighting the target

## Lazy normalization

if mini.max() > 256:
    subject = np.true_divide(mini, 256).astype("uint8")
else:
    subject = mini.astype("uint8")

## Creating a mask to remove noise.

mask = (subject < 200)
                
masked = mask * subject

## Distance transform

distmap = cv2.distanceTransform(masked,1,3)
                
## Creating all zero matrix, which size equal to data

featuremap = distmap*0
                
## Deciding kernel size to convolve.

ksize = 20
                
## Detecting edge with convolution...

for x in range(ksize,distmap.shape[0]-ksize*2):
    for y in range(ksize,distmap.shape[1]-ksize*2):
    
    ### Kernel内で最大の値を持つ座標をfeaturemap内に1として出力している...
    ### max-poolingと処理は近い。Max-poolingと標準化を同時に実施している。
    
        if distmap[x,y]>0 and distmap[x,y]==np.max(distmap[x-ksize:x+ksize,y-ksize:y+ksize]):
            featuremap[x,y]=1

        ### defining feature_dilated for imshow

        feature_dilated = cv2.dilate(featuremap, (50, 50))

print("Masked image is ... : ")
plt.imshow(masked)
plt.show()

plt.imshow(feature_dilated)
print("Feature shape is ... : " + str(feature_dilated.shape))
plt.show()

Feature_mask = (feature_dilated > 0)

Cropped = masked * Feature_mask
plt.imshow(Cropped)
plt.show()

Switch2

Tile-strategyにおいて切り取った画像ファイルをPublic-OCR-serverに処理させる。
切り取った画像ファイルを自作した1文字認識のためのMNISTのようなサーバーに処理させる。(One-hotベクトル化を行ったのちに、その座標から1文字だけをcropする。)

今後の課題

今はやりの（？）Spiking neural networkでYOLOを実装し、解釈性ゴリゴリの1文字cropping processを完成させる。
3次元実行列の固有値探索に有効な射影行列に関する研究（勉強）をする。
フルスクラッチ次元削減

参考

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up