More than 5 years have passed since last update.

セマンティックセグメンテーションでフォントの違いを検出

Last updated at 2018-09-06Posted at 2018-09-05

(2018/09/07追記)

例えば結果画像の3枚目のベースのフォントと「告」のフォントが違うとはとても思えないとご指摘いただいたため、それぞれのフォントを比較してみました。
左から順に以下のフォントを使っています。
'Yu Gothic', 'Yu Mincho', 'MS Gothic', 'MS Mincho', 'Kosugi', 'Kosugi Maru', 'M PLUS 1p', 'M PLUS Rounded 1c', 'Sawarabi Gothic', 'Sawarabi Mincho'

じっくり見るとそれぞれ微妙に違うようです。ほとんど見分け付きませんが…

前提

数ヶ月前に公文書が偽装じゃないかと騒がれていて、その理由は文章中のフォントが特定の箇所のみ違うから、であった。
私はその文章を見ても全くフォントの違いが分からないのだが、世の中には気づける人がいるらしい。

deep learningでフォントの違いを検出することができるのか気になったので試した。

使うのはセマンティックセグメンテーションで、コードは以下のSegNetを大体そのまま利用。
https://qiita.com/summer4an/items/d5b3164f2484d8fef6e1

入力データ作成

いろいろな方法があると思うが、HTMLでいろいろな文字をいろいろなフォントで書いていくことに。

使ったフォントは以下。ウェブフォントも含む。
'Yu Gothic', 'Yu Mincho', 'MS Gothic', 'MS Mincho', 'Kosugi', 'Kosugi Maru', 'M PLUS 1p', 'M PLUS Rounded 1c', 'Sawarabi Gothic', 'Sawarabi Mincho',

フォントによっては用意されていない文字もあるようなので、使う文字は常用漢字のみにした。

HTMLを自動生成するコードは以下。

f001_html_generator.py

# !/usr/bin/python
# -*- coding: utf-8 -*-

# 入力用画像データを作るためのhtmlを作る。

import random
import subprocess
import time
import os,sys

texts = "亜哀挨愛曖悪握圧扱宛嵐安案暗以衣位囲医依委威為畏胃尉異移萎偉椅彙意違維慰遺緯域育一壱逸茨芋引印因咽姻員院淫陰飲隠韻右宇羽雨唄鬱畝浦運雲永泳英映栄営詠影鋭衛易疫益液駅悦越謁閲円延沿炎怨宴媛援園煙猿遠鉛塩演縁艶汚王凹央応往押旺欧殴桜翁奥横岡屋億憶臆虞乙俺卸音恩温穏下化火加可仮何花佳価果河苛科架夏家荷華菓貨渦過嫁暇禍靴寡歌箇稼課蚊牙瓦我画芽賀雅餓介回灰会快戒改怪拐悔海界皆械絵開階塊楷解潰壊懐諧貝外劾害崖涯街慨蓋該概骸垣柿各角拡革格核殻郭覚較隔閣確獲嚇穫学岳楽額顎掛潟括活喝渇割葛滑褐轄且株釜鎌刈干刊甘汗缶完肝官冠巻看陥乾勘患貫寒喚堪換敢棺款間閑勧寛幹感漢慣管関歓監緩憾還館環簡観韓艦鑑丸含岸岩玩眼頑顔願企伎危机気岐希忌汽奇祈季紀軌既記起飢鬼帰基寄規亀喜幾揮期棋貴棄毀旗器畿輝機騎技宜偽欺義疑儀戯擬犠議菊吉喫詰却客脚逆虐九久及弓丘旧休吸朽臼求究泣急級糾宮救球給嗅窮牛去巨居拒拠挙虚許距魚御漁凶共叫狂京享供協況峡挟狭恐恭胸脅強教郷境橋矯鏡競響驚仰暁業凝曲局極玉巾斤均近金菌勤琴筋僅禁緊錦謹襟吟銀区句苦駆具惧愚空偶遇隅串屈掘窟熊繰君訓勲薫軍郡群兄刑形系径茎係型契計恵啓掲渓経蛍敬景軽傾携継詣慶憬稽憩警鶏芸迎鯨隙劇撃激桁欠穴血決結傑潔月犬件見券肩建研県倹兼剣拳軒健険圏堅検嫌献絹遣権憲賢謙鍵繭顕験懸元幻玄言弦限原現舷減源厳己戸古呼固股虎孤弧故枯個庫湖雇誇鼓錮顧五互午呉後娯悟碁語誤護口工公勾孔功巧広甲交光向后好江考行坑孝抗攻更効幸拘肯侯厚恒洪皇紅荒郊香候校耕航貢降高康控梗黄喉慌港硬絞項溝鉱構綱酵稿興衡鋼講購乞号合拷剛傲豪克告谷刻国黒穀酷獄骨駒込頃今困昆恨根婚混痕紺魂墾懇左佐沙査砂唆差詐鎖座挫才再災妻采砕宰栽彩採済祭斎細菜最裁債催塞歳載際埼在材剤財罪崎作削昨柵索策酢搾錯咲冊札刷刹拶殺察撮擦雑皿三山参桟蚕惨産傘散算酸賛残斬暫士子支止氏仕史司四市矢旨死糸至伺志私使刺始姉枝祉肢姿思指施師恣紙脂視紫詞歯嗣試詩資飼誌雌摯賜諮示字寺次耳自似児事侍治持時滋慈辞磁餌璽鹿式識軸七𠮟失室疾執湿嫉漆質実芝写社車舎者射捨赦斜煮遮謝邪蛇勺尺借酌釈爵若弱寂手主守朱取狩首殊珠酒腫種趣寿受呪授需儒樹収囚州舟秀周宗拾秋臭修袖終羞習週就衆集愁酬醜蹴襲十汁充住柔重従渋銃獣縦叔祝宿淑粛縮塾熟出述術俊春瞬旬巡盾准殉純循順準潤遵処初所書庶暑署緒諸女如助序叙徐除小升少召匠床抄肖尚招承昇松沼昭宵将消症祥称笑唱商渉章紹訟勝掌晶焼焦硝粧詔証象傷奨照詳彰障憧衝賞償礁鐘上丈冗条状乗城浄剰常情場畳蒸縄壌嬢錠譲醸色拭食植殖飾触嘱織職辱尻心申伸臣芯身辛侵信津神唇娠振浸真針深紳進森診寝慎新審震薪親人刃仁尽迅甚陣尋腎須図水吹垂炊帥粋衰推酔遂睡穂錘随髄枢崇数据杉裾寸瀬是井世正生成西声制姓征性青斉政星牲省凄逝清盛婿晴勢聖誠精製誓静請整醒税夕斥石赤昔析席脊隻惜戚責跡積績籍切折拙窃接設雪摂節説舌絶千川仙占先宣専泉浅洗染扇栓旋船戦煎羨腺詮践箋銭銑潜線遷選薦繊鮮全前善然禅漸膳繕狙阻祖租素措粗組疎訴塑遡礎双壮早争走奏相荘草送倉捜挿桑巣掃曹曽爽窓創喪痩葬装僧想層総遭槽踪操燥霜騒藻造像増憎蔵贈臓即束足促則息捉速側測俗族属賊続卒率存村孫尊損遜他多汰打妥唾堕惰駄太対体耐待怠胎退帯泰堆袋逮替貸隊滞態戴大代台第題滝宅択沢卓拓託濯諾濁但達脱奪棚誰丹旦担単炭胆探淡短嘆端綻誕鍛団男段断弾暖談壇地池知値恥致遅痴稚置緻竹畜逐蓄築秩窒茶着嫡中仲虫沖宙忠抽注昼柱衷酎鋳駐著貯丁弔庁兆町長挑帳張彫眺釣頂鳥朝脹貼超腸跳徴嘲潮澄調聴懲直勅捗沈珍朕陳賃鎮追椎墜通痛塚漬坪爪鶴低呈廷弟定底抵邸亭貞帝訂庭逓停偵堤提程艇締諦泥的笛摘滴適敵溺迭哲鉄徹撤天典店点展添転塡田伝殿電斗吐妬徒途都渡塗賭土奴努度怒刀冬灯当投豆東到逃倒凍唐島桃討透党悼盗陶塔搭棟湯痘登答等筒統稲踏糖頭謄藤闘騰同洞胴動堂童道働銅導瞳峠匿特得督徳篤毒独読栃凸突届屯豚頓貪鈍曇丼那奈内梨謎鍋南軟難二尼弐匂肉虹日入乳尿任妊忍認寧熱年念捻粘燃悩納能脳農濃把波派破覇馬婆罵拝杯背肺俳配排敗廃輩売倍梅培陪媒買賠白伯拍泊迫剝舶博薄麦漠縛爆箱箸畑肌八鉢発髪伐抜罰閥反半氾犯帆汎伴判坂阪板版班畔般販斑飯搬煩頒範繁藩晩番蛮盤比皮妃否批彼披肥非卑飛疲秘被悲扉費碑罷避尾眉美備微鼻膝肘匹必泌筆姫百氷表俵票評漂標苗秒病描猫品浜貧賓頻敏瓶不夫父付布扶府怖阜附訃負赴浮婦符富普腐敷膚賦譜侮武部舞封風伏服副幅復福腹複覆払沸仏物粉紛雰噴墳憤奮分文聞丙平兵併並柄陛閉塀幣弊蔽餅米壁璧癖別蔑片辺返変偏遍編弁便勉歩保哺捕補舗母募墓慕暮簿方包芳邦奉宝抱放法泡胞俸倣峰砲崩訪報蜂豊飽褒縫亡乏忙坊妨忘防房肪某冒剖紡望傍帽棒貿貌暴膨謀頰北木朴牧睦僕墨撲没勃堀本奔翻凡盆麻摩磨魔毎妹枚昧埋幕膜枕又末抹万満慢漫未味魅岬密蜜脈妙民眠矛務無夢霧娘名命明迷冥盟銘鳴滅免面綿麺茂模毛妄盲耗猛網目黙門紋問匁冶夜野弥厄役約訳薬躍闇由油喩愉諭輸癒唯友有勇幽悠郵湧猶裕遊雄誘憂融優与予余誉預幼用羊妖洋要容庸揚揺葉陽溶腰様瘍踊窯養擁謡曜抑沃浴欲翌翼拉裸羅来雷頼絡落酪辣乱卵覧濫藍欄吏利里理痢裏履璃離陸立律慄略柳流留竜粒隆硫侶旅虜慮了両良料涼猟陵量僚領寮療瞭糧力緑林厘倫輪隣臨瑠涙累塁類令礼冷励戻例鈴零霊隷齢麗暦歴列劣烈裂恋連廉練錬呂炉賂路露老労弄郎朗浪廊楼漏籠六録麓論和話賄脇惑枠湾腕"
texts_num = len(texts)

fonts = [
    'Yu Gothic',
    'Yu Mincho',
    'MS Gothic',
    'MS Mincho',
    'Kosugi',
    'Kosugi Maru',
    'M PLUS 1p',
    'M PLUS Rounded 1c',
    'Sawarabi Gothic',
    'Sawarabi Mincho',
]
fonts_num = len(fonts)

# catとこのスクリプト内のprintが前後してしまうことがあるのでバッファリングしないように。
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', buffering=1)
sys.stderr = os.fdopen(sys.stderr.fileno(), 'w', buffering=1)
sys.stdin = os.fdopen(sys.stdin.fileno(), 'r', buffering=1)

if len(sys.argv)==1:
    print('error. too few argument. please exec like below.')
    print('  python3 ./f001_html_generator.py train (generate file kosuu)')
    print('  python3 ./f001_html_generator.py train 10')
    print('  python3 ./f001_html_generator.py test (generate file kosuu) (normal font #) (normal font ratio)')
    for i in range(len(fonts)) :
        print('  python3 ./f001_html_generator.py test 1 {} 0.9'.format(i))
    exit()
exec_mode = sys.argv[1]
print('exec_mode is {}'.format(exec_mode))
generate_file_kosuu = int(sys.argv[2])
print('generate file kosuu is {}'.format(generate_file_kosuu))
if exec_mode=="train" :
    if len(sys.argv)!=3:
        print('error. too few argument.')
        exit()
    pass
elif exec_mode=="test" :
    if len(sys.argv)!=5:
        print('error. too few argument.')
        exit()
    normal_font_index = int(sys.argv[3])
    normal_font_ratio = float(sys.argv[4])
else :
    print('error. unknown exec_mode.')
    exit()

for kosuu in range(generate_file_kosuu):
    print()
    if exec_mode=="train" :
        output_filename_font = 'gomi_output_for_{}/gomi_big_output_{:010}_font.html'.format(exec_mode, kosuu)
        output_filename_label = 'gomi_output_for_{}/gomi_big_output_{:010}_label.html'.format(exec_mode, kosuu)
        random.seed(777+kosuu)
    else : #"test"
        output_filename_font = 'gomi_output_for_{}/gomi_big_output_normal{}_{:010}_font.html'.format(exec_mode, normal_font_index, kosuu)
        output_filename_label = 'gomi_output_for_{}/gomi_big_output_normal{}_{:010}_label.html'.format(exec_mode, normal_font_index, kosuu)
        random.seed(888 + normal_font_index + kosuu)
    print(output_filename_font, output_filename_label)

    output_text1=""
    output_text2=""
    for i in range(30*150):
        text_index = random.randrange(texts_num)
        if exec_mode=="train" :
            font_index = random.randrange(fonts_num)
        else : #"test"
            temp = int(round(10*normal_font_ratio/(1-normal_font_ratio)))
            font_index = random.randrange(fonts_num+temp)
            if len(fonts) <= font_index :
                font_index = normal_font_index
        text = texts[text_index]
        font = fonts[font_index]
        output_text1 += '<span style="font-family: \'{}\';">{}</span>'.format(font,text)
        color_text = '#{:02x}{:02x}{:02x}'.format(font_index+1,font_index+1,font_index+1)
        #テキストの色が背景色と同じ色でもなぜか薄く別の色でも書かれてしまうので、全角スペースにした。
        #普通にHTMLを表示してそれをPrintScreenした場合は出ないが、chromeで全画面キャプチャすると出る。
        output_text2 += '<span style="font-family: \'{}\'; background-color: {}; color: {}">{}</span>'.format(font,color_text,color_text,'　')
        if i%30==29:
            output_text1 += '<br>\n'
            output_text2 += '<br>\n'
    print('text generate done.')

    #import code; vvv = globals(); vvv.update(locals()); code.InteractiveConsole(vvv).interact()

    with open(output_filename_font, "w") as f:
        temp1 = subprocess.check_output(['cat','sample_html_1_head.html'])
        temp2 = subprocess.check_output(['cat','sample_html_1_tail.html'])
        f.write(temp1.decode('utf-8')+output_text1+temp2.decode('utf-8'))

    with open(output_filename_label, "w") as f:
        temp1 = subprocess.check_output(['cat','sample_html_2_head.html'])
        temp2 = subprocess.check_output(['cat','sample_html_2_tail.html'])
        f.write(temp1.decode('utf-8')+output_text2+temp2.decode('utf-8'))
    print('file generate done.')

sample_html_1_head.html

<html>
<head>
<style type="text/css">
<!--
@charset "utf-8";
* {
    margin:0; padding:0;
    -webkit-margin-before: 0px;
    -webkit-margin-after: 0px;
    -webkit-margin-start: 0px;
    -webkit-margin-end: 0px;
    font-size : 40px;
    line-height: 40px;
}
-->
</style>
<link href="https://fonts.googleapis.com/css?family=Kosugi" rel="stylesheet">
<link href="https://fonts.googleapis.com/css?family=Kosugi+Maru" rel="stylesheet">
<link href="https://fonts.googleapis.com/css?family=M+PLUS+1p" rel="stylesheet">
<link href="https://fonts.googleapis.com/css?family=M+PLUS+Rounded+1c" rel="stylesheet">
<link href="https://fonts.googleapis.com/css?family=Sawarabi+Gothic" rel="stylesheet">
<link href="https://fonts.googleapis.com/css?family=Sawarabi+Mincho" rel="stylesheet">
</head>
<body>

sample_html_1_tail.html

</body>
</html>

sample_html_2_head.html

<html>
<head>
<style type="text/css">
<!--
@charset "utf-8";
* {
    margin:0; padding:0;
    -webkit-margin-before: 0px;
    -webkit-margin-after: 0px;
    -webkit-margin-start: 0px;
    -webkit-margin-end: 0px;
    font-size : 40px;
    line-height: 40px;
    background-color: black;
}
-->
</style>
<link href="https://fonts.googleapis.com/css?family=Kosugi" rel="stylesheet">
<link href="https://fonts.googleapis.com/css?family=Kosugi+Maru" rel="stylesheet">
<link href="https://fonts.googleapis.com/css?family=M+PLUS+1p" rel="stylesheet">
<link href="https://fonts.googleapis.com/css?family=M+PLUS+Rounded+1c" rel="stylesheet">
<link href="https://fonts.googleapis.com/css?family=Sawarabi+Gothic" rel="stylesheet">
<link href="https://fonts.googleapis.com/css?family=Sawarabi+Mincho" rel="stylesheet">
</head>
<body>

sample_html_2_tail.html

</body>
</html>

以下で実行。
train用には完全にランダムにフォント種を選択して出力する。

python f001_html_generator.py train 10

このHTMLをchromeで開く。
ただこれだと1画面に表示しきれずPrintScreenもできないので、Ctrl+Jでデベロッパーツールを開き、左下のToggle Device Toolbarで画面サイズを操作できるようにし、画面サイズを1600x8000に変更。
右上の縦3点のMore OptionからCapture full size screenshotを選択してキャプチャを得る。
ただ、フォントサイズは40pxを指定しているはずなのになぜか80x80ぐらいになる。まあ気にしない。
入力と正解のラベルデータをそれぞれ作る。

さらに学習用にランダムで480x360に切り出すため以下を使う。

f101_sample_kirinuki.py

# !/usr/bin/python
# -*- coding: utf-8 -*-


# 大きな画像からランダムで抜き出したものを大量に作る。
# 出来たものを以下でリスト化する。
#  find ../f001_sample_generator/gomi_output_for_train/ -name 'gomi_small_output_*.png' | sort | awk 'NR%3==1{temp=$0} NR%3==2{print temp, $0}' | sort -R > filelist_all.txt
#  split -l9000 filelist_all.txt; mv xaa filelist_train.txt; mv xab filelist_val.txt

kirinuki_size_w = 480
kirinuki_size_h = 360
generate_num_per_1big_pic = 1000

import random
import subprocess
import time
import os,sys
from PIL import Image
import glob
import random
import numpy as np


if len(sys.argv)!=2:
    print('error. too few argument. please exec like below.')
    print('  python3 ./f001_html_generator.py train')
    print('  python3 ./f001_html_generator.py test')
    exit()
exec_mode = sys.argv[1]
print('exec_mode is {}'.format(exec_mode))


random_seed = 1
random.seed(random_seed)


filelist = glob.glob("gomi_output_for_{}/gomi_big_*_font*.png".format(exec_mode))

for file1 in filelist:
    file2 = file1.replace('_font', '_label')
    if file1 == file2 :
        print('illegal error.')
        exit()
    print('target is', file1, file2)
    im1 = np.array(Image.open(file1).convert('L'))
    im2 = np.array(Image.open(file2).convert('L'))

    koko = 99999
    for i in range(im1.shape[1]-1,0-1,-1):
        temp = np.unique(im1[:,i]).shape[0]
        if temp != 1 :
            koko = i
            break
    im1 = im1[:,:koko+1]
    im2 = im2[:,:koko+1]
    koko = 99999
    for i in range(im1.shape[0]-1,0-1,-1):
        temp = np.unique(im1[i,:]).shape[0]
        if temp != 1 :
            koko = i
            break
    im1 = im1[:koko+1,:]
    im2 = im2[:koko+1,:]

    im1 = Image.fromarray(np.uint8(im1))
    im2 = Image.fromarray(np.uint8(im2))
    w, h = im1.size

    for i in range(generate_num_per_1big_pic):
        w_koko = random.randrange(w-kirinuki_size_w)
        h_koko = random.randrange(h-kirinuki_size_h)
        temp1 = im1.crop((w_koko,h_koko,w_koko+kirinuki_size_w,h_koko+kirinuki_size_h))
        temp2 = im2.crop((w_koko,h_koko,w_koko+kirinuki_size_w,h_koko+kirinuki_size_h))

        #print(np.unique(np.array(temp2)))
        #print(np.unique(np.array(temp2)).shape)
        if np.unique(np.array(temp2)).shape[0]==1 : #ベースのフォント以外が含まれていないので作り直す。
            print("tansyoku")
            continue

        temp1.save("{}_{:010}_1.png".format(file1.replace('.png','').replace('_big','_small'), i))
        temp2.save("{}_{:010}_2.png".format(file1.replace('.png','').replace('_big','_small'), i))

        #ラベル画像の色を見やすいように置換する。
        temp3 = np.array(temp2) #numpyにする。
        #print('temp3 uniq',np.unique(temp3))
        Black = [30,30,30]; Sky = [128,128,128]; Building = [128,0,0]; Pole = [192,192,128]; Road_marking = [255,69,0]; Road = [128,64,128]; Pavement = [60,40,222]; Tree = [128,128,0]; SignSymbol = [192,128,128]; Fence = [64,64,128]; Car = [64,0,128];
        label_colours = np.array([Black, Sky, Building, Pole, Road_marking, Road, Pavement, Tree, SignSymbol, Fence, Car])
        r = temp3.copy()
        g = temp3.copy()
        b = temp3.copy()
        for l in range(0,len(label_colours)):
            r[temp3==l] = label_colours[l,0]
            g[temp3==l] = label_colours[l,1]
            b[temp3==l] = label_colours[l,2]
        rgb = np.zeros((temp3.shape[0], temp3.shape[1], 3))
        rgb[:,:,0] = r/1.0
        rgb[:,:,1] = g/1.0
        rgb[:,:,2] = b/1.0
        #print(np.unique(rgb))
        temp3 = Image.fromarray(np.uint8(rgb))
        temp3.save("{}_{:010}_2_color.png".format(file1.replace('.png','').replace('_big','_small'), i))

以下で実行。

python f101_sample_kirinuki.py train

これで以下のような感じの学習用データが10000枚生成される。
ただし正解のラベルデータはRGB=0x010101等の色で見にくいので、適当に色をつけている。

以下のような形式でfilelist_train.txtとfilelist_val.txtを用意する。

(入力画像ファイル名) (ラベル画像ファイル名)
gomi_output_for_train/gomi_small_output_0000000001_font.html_0000000549_1.png gomi_output_for_train/gomi_small_output_0000000001_font.html_0000000549_2.png
gomi_output_for_train/gomi_small_output_0000000002_font.html_0000000588_1.png gomi_output_for_train/gomi_small_output_0000000002_font.html_0000000588_2.png
gomi_output_for_train/gomi_small_output_0000000003_font.html_0000000580_1.png gomi_output_for_train/gomi_small_output_0000000003_font.html_0000000580_2.png

同様にテスト用の画像も用意。
テストでは基本的にはベースのフォントを使用し、低い確率でランダムに別のフォントを使用する。
ベースのフォントの種類と、ベースフォントの割合を指定する。

python3 ./f001_html_generator.py test 1 0 0.9
python3 ./f001_html_generator.py test 1 1 0.9
python3 ./f001_html_generator.py test 1 2 0.9
python3 ./f001_html_generator.py test 1 3 0.9
python3 ./f001_html_generator.py test 1 4 0.9
python3 ./f001_html_generator.py test 1 5 0.9
python3 ./f001_html_generator.py test 1 6 0.9
python3 ./f001_html_generator.py test 1 7 0.9
python3 ./f001_html_generator.py test 1 8 0.9
python3 ./f001_html_generator.py test 1 9 0.9
python f101_sample_kirinuki.py test

これでHTMLができるので学習用データのときと同じようにし、以下のようなものができればOK。

学習

以下で学習。

python main.py --log_dir=log_train --train_image_list=filelist_train.txt --val_image_list=filelist_val.txt

バッチサイズ5、20000 iterationで、GTX 1060で7時間程度。

推論

Lossは下がり続けたようなので、最終的なパラメータファイルを使って推論。
以下で実行。

python main.py --log_dir=log_test --ckpt_for_test=model.ckpt --test_image_list=filelist_test.txt --batch_size=5 --save_image=True

結果

以下が結果。
左が入力で、真ん中が推論結果、右が正解ラベル。真ん中と右はもともとRGB=0x010101等の色で見にくいので、適当に色をつけている。

縁の文字は切れて中途半端になっているのでともかく、他はちゃんと推論できている。
フォントによってはパディング部分？が大きかったり小さかったり、行間に隙間があったりするみたい。

良い出来だったので満足。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up