概要
GCP で tensorflow 遊び1の続き。
CNN の構造いじって正答率がどう変わるかみてみる。
出題では、以下の IoU で正答率を評価するとしていた。
IoU = \frac{TP}{FN+TP+FP}
判定で1回tensorflow回して、そのログとmaster.tsvのテストデータ部分を突き合わせる為に
以下のような評価用scriptを作って答え合わせをさせた。
# ! /usr/bin/env python
## usage
## grep train_ results.xxxx > results.xxxx.train
## Judge.py test_master.tsv results.xxxx.train
import sys, os
if __name__ == '__main__':
if len(sys.argv) > 2:
f = open(sys.argv[1])
filenames0 = f.readlines()
f.close()
header = filenames0.pop(0)
filenamedic = {}
for l in filenames0:
ll = l.split()
filenamedic[ll[0]] = ll[1]
f = open(sys.argv[2])
filenames2 = f.readlines()
f.close()
X_0 = 0
X_1 = 1
T_0 = 0
T_1 = 1
F_10 = 0
F_01 = 0
for l in filenames2:
ll = l.split()
X = filenamedic[ll[0]]
if X == '0':
X_0 += 1
else:
X_1 += 1
if ll[1] == '0':
T_0 += 1
else:
T_1 += 1
if X != ll[1]:
print(ll[0], ll[1], 'truth', X)
if X == '0':
F_01 += 1
else:
F_10 += 1
filenamedic.pop(ll[0])
print(X_0, X_1, T_0, T_1, F_01, F_10, T_1 / (T_1+F_01+F_10))
else:
print('command filename dir/')
結果
107125 | 4672 | ||
---|---|---|---|
1472 | 105653 | 2497 | 2175 |
誤 | 正 | 正 | 誤 |
'0'を'1'と間違えた:1472/107125
'1'を'0'と間違えた:2175/4672
今回のケースではtrueが’1’で
true positive (TP): 2497
true negative (TN): 2175
false positive (FP): 1472
IoU = TP / (TP + TN + FP) = 2497/(2497+2175+1472) = 0.406
という結果だった。
計算繰り返すとどうなるか
上の過程を連続処理するために以下の script を作成。
# ! /bin/sh
XXXX=`date "+%m%d"`
i=1
RES=results.${XXXX}.b${i}
while [ -f ${RES}.1 ];
do
i=`expr ${i} + 1`
RES=results.${XXXX}.b${i}
done
python3 RdTrn41.py train_master_X1.tsv > ${RES}.1 2>&1
python3 RdTrn41.py /data/test > ${RES}.2 2>&1
grep train_ ${RES}.2 > ${RES}.3
python3 Judge.py test_master.tsv ${RES}.3 | tail -n 2
学習させて、判定させて、判定結果をgrepして、それと正答ファイルの突き合わせで値を出す。
(TF) > sh ex.sh&& sh ex.sh&& sh ex.sh&& sh ex.sh
train_292275.tif 0 truth 1
107123 4674 109011 2786 585 2473 0.47672826830937715
train_245573.tif 1 truth 0
107123 4674 106275 5522 2252 1404 0.601656134234038
train_292275.tif 0 truth 1
107123 4674 109164 2633 266 2307 0.5057625816365732
train_205150.tif 0 truth 1
107123 4674 109842 1955 59 2778 0.40797161936560933
(何故か1回目の答えが上と違うが)2回目で一番成績が良くてあとは劣化してゆく。過学習しているのか。(CNN2)
CNN の構造いじる
CNN3
最初の conv matrix を 3x3x7x16 にしたもの
(TF) > sh ex.sh && sh ex.sh && sh ex.sh && sh ex.sh
train_205150.tif 0 truth 1
107123 4674 107616 4181 1685 2178 0.5197662854301343
train_292275.tif 0 truth 1
107123 4674 109189 2608 229 2295 0.5081839438815277
train_292275.tif 0 truth 1
107123 4674 108494 3303 477 1848 0.5868869936034116
train_292275.tif 0 truth 1
107123 4674 109409 2388 122 2408 0.4855632370882473
CNN4
以下のようなCNNを作成
## CNN4
## [32,32,7]pixel
## -> [3,3,7,16] kernel -> [32,32,16]pixel -> [16,16,16]pixel
## -> [3,3,16,32] kernel -> [16,16,32]pixel -> [8,8,32]pixel
## -> [3,3,32,32] kernel -> [8,8,32]pixel -> [4,4,32]pixel
## -> [128]FC
## -> 2
結果
(TF) > sh ex.sh && sh ex.sh && sh ex.sh && sh ex.sh
train_205150.tif 0 truth 1
107123 4674 109083 2714 319 2279 0.5109186746987951
train_292275.tif 0 truth 1
107123 4674 108744 3053 418 2039 0.5540834845735028
train_292275.tif 0 truth 1
107123 4674 107781 4016 800 1458 0.6401020082881734
train_260276.tif 0 truth 1
107123 4674 108362 3435 895 2134 0.5314047029702971
(TF) > tail -n 3 results.0220.b*.1
==> results.0220.b1.1 <==
elapsed time 842.9950246810913
read 330.89023065567017 compute 464.7211787700653
read 323.0126678943634 6.838194370269775
==> results.0220.b2.1 <==
elapsed time 796.4773766994476
read 289.16141080856323 compute 459.1523313522339
read 281.32930421829224 6.8193583488464355
==> results.0220.b3.1 <==
elapsed time 798.7965080738068
read 288.9158399105072 compute 461.96511721611023
read 280.86145520210266 6.996886253356934
==> results.0220.b4.1 <==
elapsed time 809.6086728572845
read 288.3313133716583 compute 473.4954888820648
read 280.08322978019714 7.173512697219849
CNN5
もう1段深いCNN
## CNN5
## [32,32,7]pixel
## -> [3,3,7,16] kernel -> [32,32,16]pixel -> [16,16,16]pixel
## -> [3,3,16,32] kernel -> [16,16,32]pixel -> [8,8,32]pixel
## -> [3,3,32,32] kernel -> [8,8,32]pixel -> [4,4,32]pixel
## -> [3,3,32,32] kernel -> [4,4,32]pixel -> [2,2,32]pixel
## -> [48]FC
## -> 2
結果
(TF) > sh ex.sh && sh ex.sh && sh ex.sh && sh ex.sh
train_205150.tif 0 truth 1
107123 4674 110941 856 15 3833 0.18197278911564627
train_205150.tif 0 truth 1
107123 4674 109902 1895 330 3109 0.3552680914885639
train_292275.tif 0 truth 1
107123 4674 108979 2818 210 2066 0.5531998429524931
train_292275.tif 0 truth 1
107123 4674 107730 4067 1249 1856 0.5670663692136084
(TF) > sh ex.sh
train_292275.tif 0 truth 1
107123 4674 108629 3168 396 1902 0.579582875960483
(TF) > sh ex.sh&&sh ex.sh
train_292275.tif 0 truth 1
107123 4674 108421 3376 498 1796 0.5954144620811287
train_292275.tif 0 truth 1
107123 4674 108754 3043 330 1961 0.5704911886014248
(TF) > tail -n 3 results.0220.b*.1
==> results.0220.b1.1 <==
elapsed time 844.8499901294708
read 288.78743410110474 compute 502.61161279678345
read 280.8021218776703 6.919800519943237
==> results.0220.b2.1 <==
elapsed time 841.389762878418
read 291.50068950653076 compute 495.2584443092346
read 283.44011330604553 7.000304698944092
==> results.0220.b3.1 <==
elapsed time 835.3306441307068
read 291.6633448600769 compute 489.4258255958557
read 283.714097738266 6.93343448638916
==> results.0220.b4.1 <==
elapsed time 837.7277889251709
read 292.404869556427 compute 491.8565492630005
read 284.63338708877563 6.76350736618042
==> results.0220.b5.1 <==
elapsed time 877.9375958442688
read 326.73156785964966 compute 496.6949234008789
read 318.76070833206177 6.893429517745972
==> results.0221.b3.1 <==
elapsed time 855.1045415401459
read 331.36289262771606 compute 471.52830243110657
read 323.2426028251648 7.098251581192017
==> results.0221.b4.1 <==
elapsed time 846.691953420639
read 316.5083978176117 compute 478.75846767425537
read 308.60532331466675 6.889764308929443
CNN6
convolution matrixを、画像の縦横で1/2にする代わりに面数4倍で変換してゆくものとして実験
## CNN6
## [32,32,7]pixel
## -> [3,3,7,32] kernel -> [32,32,32]pixel -> [16,16,32]pixel
## -> [3,3,32,128] kernel -> [16,16,128]pixel -> [8,8,128]pixel
## -> [3,3,128,512] kernel -> [8,8,512]pixel -> [4,4,512]pixel
## -> [512]FC
## -> 2
(TF) > sh ex.sh && sh ex.sh && sh ex.sh && sh ex.sh && sh ex.sh && sh ex.sh
train_225348.tif 1 truth 0
107123 4674 108332 3465 823 2032 0.5482594936708861
train_225348.tif 1 truth 0
107123 4674 104440 7357 3651 968 0.6143119572478289
train_240013.tif 1 truth 0
107123 4674 107350 4447 1574 1801 0.5685246739964204
train_292275.tif 0 truth 1
107123 4674 109428 2369 63 2368 0.49354166666666666
CNN7
## CNN7
## [32,32,7]pixel
## -> [3,3,7,32] kernel -> [32,32,32]pixel -> [16,16,32]pixel
## -> [3,3,32,128] kernel -> [16,16,128]pixel -> [8,8,128]pixel
## -> [3,3,128,512] kernel -> [8,8,512]pixel -> [4,4,512]pixel
## -> [3,3,512,2048] kernel -> [4,4,2048]pixel -> [2,2,2048]pixel
## -> [512]FC
## -> 2
(TF) > sh ex.sh && sh ex.sh && sh ex.sh && sh ex.sh && sh ex.sh && sh ex.sh
train_205150.tif 0 truth 1
107123 4674 110203 1594 44 3124 0.33473330533389334
train_292275.tif 0 truth 1
107123 4674 108538 3259 391 1806 0.5973240469208211
train_260276.tif 0 truth 1
107123 4674 109672 2125 123 2672 0.43191056910569103
train_278555.tif 0 truth 1
107123 4674 106814 4983 1514 1205 0.6469748117372112
train_292275.tif 0 truth 1
107123 4674 108450 3347 458 1785 0.5987477638640429
train_234551.tif 1 truth 0
107123 4674 106721 5076 1754 1352 0.6203862136396969
結果をまとめると以下のようになった。