More than 5 years have passed since last update.

bash > lookを用いて英単語を抽出し、正立画像と倒立画像を作成し、そのBlue成分をcsv出力する実装 > v0.1,v0.2

Last updated at 2017-06-17Posted at 2017-03-03

動作環境

GeForce GTX 1070 (8GB)
ASRock Z170M Pro4S [Intel Z170chipset]
Ubuntu 14.04 LTS desktop amd64
TensorFlow v0.11
cuDNN v5.1 for Linux
CUDA v8.0
Python 2.7.6
IPython 5.1.0 -- An enhanced Interactive Python.
gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
GNU bash, version 4.3.8(1)-release (x86_64-pc-linux-gnu)

v0.1

code

(デバッグ用にexit for debugのコメントのように終了している)
バグ: ppmファイルの行ごとの項目数が画像によって異なる点を考慮できていない

prep_data_exec_170304

# !/usr/bin/env bash

# v0.1 Mar. 4, 2017
#   - add [OUTPUT_INFILE], [OUTPUT_OUTFILE]
#   - add toCsvText()
#   - flip vertically the image
#   - make upright image for the word
#   - loop for all the words in the [look] dictionary (99,154 words)
#

cnt=1
# IMG_SIZE=20x20
IMG_SIZE=30x30
# FONT_SIZE=12
FONT_SIZE=20
OUTPUT_INFILE="test_in.csv"
OUTPUT_OUTFILE="test_out.csv"

toCsvText() {
	convert -depth 8 -compress none $1 tmp.ppm
	tail -n +4 tmp.ppm | awk -F" " -v OFS=',' '{print $1,$4,$7,$10,$13,$16,""}' | tr -d '\n'
	echo # end of line
}

rm -f $OUTPUT_INFILE
rm -f $OUTPUT_OUTFILE

for x in {a..z}; do
	list=$(look $x)
	for aword in $list; do
		echo $aword # for progress display

		# image for the word
		# - upright
		convert -background lightblue -fill blue -size $IMG_SIZE -gravity center -pointsize $FONT_SIZE label:$aword label_uprt.gif
		# - upside down
		convert label_uprt.gif -flip label_updn.gif		

		# to csv file
		toCsvText label_uprt.gif >> $OUTPUT_INFILE
		toCsvText label_updn.gif >> $OUTPUT_OUTFILE

		# --- exit for debug ---
		((cnt++))
		if [ $cnt -ge 5 ]; then
			exit 0
		fi
	done
done

結果

以下が生成される。

label_uprt.gif : 正立画像
label_updn.gif : 倒立画像
test_in.csv : 正立画像のBlue成分
test_out.csv : 倒立画像のBlue成分

$eog label_uprt.gif

$eog label_updn.gif

TODO

csvファイル出力時のカンマの数がおかしい。
TensorFlowで学習する画素数は要検討
255で割って規格化した方がいいのか要検討
データをランダムで使用する部分はTensorFlow側で行うことにした

v0.2 > ppm読込み失敗の修正

(追記 2017/03/05)

csvファイル出力時のカンマの数がおかしい。

を修正した。

prep_data_exec_170304

# !/usr/bin/env bash

#
# v0.2 Mar. 5, 2017
#   - add extract_blue()
# v0.1 Mar. 4, 2017
#   - add [OUTPUT_INFILE], [OUTPUT_OUTFILE]
#   - add toCsvText()
#   - flip vertically the image
#   - make upright image for the word
#   - loop for all the words in the [look] dictionary (99,154 words)
#

cnt=1

# IMG_SIZE=20x20
IMG_SIZE=30x30
# FONT_SIZE=12
FONT_SIZE=20
OUTPUT_INFILE="test_in.csv"
OUTPUT_OUTFILE="test_out.csv"

extract_blue() {
	# input: ( (R,G,B), (R,G,B), ... )
	((cnt=1))
	for aword in $@;do
		mdl=$((cnt % 3))
		if [ $mdl -eq 0 ];then  # (1:R, 2:G, 0:B)
			echo $aword
		fi
		((cnt++))
	done	
}

toCsvText() {
	convert -depth 8 -compress none $1 tmp.ppm
	res=$(extract_blue $(tail -n +4 tmp.ppm) | tr '\n' ' ' | sed 's/ /,/g')
	# remove comma at the end of the line
	echo $res | sed 's/,$//g'
}

rm -f $OUTPUT_INFILE
rm -f $OUTPUT_OUTFILE

for x in {a..z}; do
	list=$(look $x)
	for aword in $list; do
		echo $aword # for progress display

		# image for the word
		# - upright
		convert -background lightblue -fill blue -size $IMG_SIZE -gravity center -pointsize $FONT_SIZE label:$aword label_uprt.gif
		# - upside down
		convert label_uprt.gif -flip label_updn.gif		

		# to csv file
		toCsvText label_uprt.gif >> $OUTPUT_INFILE
		toCsvText label_updn.gif >> $OUTPUT_OUTFILE

		# --- exit for debug ---
		((cnt++))
		if [ $cnt -ge 5 ]; then
			exit 0
		fi
	done
done

それらしい処理になっているが、遅い。
1000語の処理に1分30秒かかる。
99,154処理した場合は148時間。。。

処理時間の半分はconvert処理によるようだ。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up