More than 3 years have passed since last update.

顔画像データセット仕分けのための学習用データ作成(#1)

Last updated at 2020-12-28Posted at 2020-12-27

概要

UTKFaceデータセットを特徴ごとに仕分けするモデルの学習用データを作成する。
色々とご指摘頂けますと幸いです。

UTKFace

UTKFaceページ
・ダウンロードすると3つの圧縮フォルダがあり
　①900MB
　②500MB
　③70MB
それぞれに画像が保存されている。3つ合わせると２万枚を越える。
・画像のファイル名が　「年齢-性別-民族-○○○.jpg」とラベル付けされている

仕分け

データセットを確認すると加工処理されたデータや、複数人が写っているデータなどがある
上記②フォルダのデータを使用し
・問題なし
・グレースケール
・複数人が写っている
・加工処理されている
を判別するための学習用データを作成する。

今回は下記リンクにかなり助けられました。
顔画像から年齢を予測

環境

Google Colaboratory（GPU）


import os, zipfile, io, re
from PIL import Image           #Image.openで必要
import numpy as np

X=[]
Y=[]
im_size = 299

# 問題なしデータ ZIP読み込み
z = zipfile.ZipFile('/content/drive/My Drive/image.zip')
imgfiles = [ x for x in z.namelist() if re.search(r"^image.*jpg$", x)]

for imgfile in imgfiles:
    image = Image.open(io.BytesIO(z.read(imgfile)))
    image = image.convert('RGB')
    image = image.resize((im_size, im_size))
    data = np.asarray(image)
    X.append(data)
    Y.append(0)

z.close()
del z, imgfiles

予め②フォルダの画像データを手作業で4つのフォルダに分割する。
各フォルダを読み込み

普通データはY.append(0)
グレースケールはY.append(1)
複数人はY.append(2)
加工処理はY.append(3)
としました。


X = np.asarray(X)
Y = np.asarray(Y)

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

from keras.utils.np_utils import to_categorical

# 正解ラベルをone-hotの形にします
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

del X,Y

手作業で仕分けした4つのZIPファイルの読み込みと学習用データ作成までをFOR文で回したいと思いましたがちょっと思いつかなかったので実際はかなり長い文章になってしまっています。

次回はこれを使用してディープラーニングに挑戦していきたいと思います。
顔画像データセット仕分けのためモデル構築-VGG19の転移学習

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up