More than 5 years have passed since last update.

ndarray型で(32,32,3)を4次元テンソル(1,32,32,1)に変換する方法

Python

Last updated at 2020-01-31Posted at 2020-01-06

事の発端

pythonのndarray型で、(32,32,3)を４次元テンソル(1,32,32,1)に変換する必要が発生しました。目的は、機械学習のデータ用です。

４次元テンソル

(1,32,32,1)のようなndarray型の配列のことを「4次元テンソル」といいます。この4次元テンソルから読み取れる画像の内容は、(画像の数、画像のheight, 画像のwidth, 画像のチャネル数)です。
画像のチャネル数は、1のときグレースケールで、3のときカラーでrgbを意味します。
一枚の画像は、(32,32,3)というndarray型の配列で表され、画像のデータセットではないと判断できます。

追記）数学系専門の人に４次元テンソルと伝えると、別のイメージで伝わるみたいですが、自分は「4次元テンソルのデータセットにしないと、Kerasで利用できないんですよ」的な言い方が気に入ったので、よく使っています（笑）

ndarrayを4次元テンソルに変換したい

ndarray型は思うように変換するのはけっこう難しいと思います。
とりあえず、ndarray型は以下のように変換ができることを確認しました。

import numpy as np

a = np.arange(6)
a = a.reshape(2, 3)
print(a)
# ↓出力結果
# [[0 1 2]
# [3 4 5]]
print("===============\n")

a = a.reshape(2,3,1)
print(a)
# ↓出力結果
# [[[0]
#  [1]
#  [2]]
#
# [[3]
#  [4]
#  [5]]]
print("---------------\n")
a = a.reshape(1,2,3,1)
print(a)
# ↓出力結果
# [[[[0]
#   [1]
#   [2]]
#
#  [[3]
#   [4]
#   [5]]]]

これで、以下のpredict関数に入れることができそうです。
y_pred = model.predict(x)
xにはndarray型で(1, 32, 16, 1)のデータを入力しないとエラーになります。
(32, 16, 1)とかでもエラーになります。

コード

from PIL import Image
import numpy as np

# 3 * 2のところは、実際には 32 * 32 とか置き換えてください。
c = np.arange(3 * 2)
c = c.reshape(3, 2)

pilImg = Image.fromarray(np.uint8(c))
# pilImg_1 = pilImg.convert("RGB")
pilImg_1 = pilImg.convert("L")
data = np.array(pilImg_1, dtype='int64')
print(type(data))
print(data)
print(data.shape)

a = data
print("===============\n")

a = a.reshape(3,2,1)
print(a)

print("===============\n")

a = data.reshape(1,3,2,1)
print(a)

ndarray型で(32,32,3)を(32,32)に変換する方法

おまけです。rgbの画像をグレースケールに変えて使うような場合に使います。どれだけ需要があるのかわかりませんが。

from PIL import Image
import numpy as np


file = "neko.png"
image = Image.open(file)
image = image.convert("RGB")
data_rgb = np.array(image, dtype='int64')          

# rgbなので、(height, width, 3)という配列になる
print(type(data_rgb))
print("data_rgb ... " + str(data_rgb.shape))

pilImg_rgb = Image.fromarray(np.uint8(data_rgb))
pilImg_gray = pilImg_rgb.convert("L")
data_gray = np.array(pilImg_gray, dtype='int64') 

# グレースケールなので、(height, width)という配列になる
print(type(data_gray))
print("data_gray ... " + str(data_gray.shape))

# 
pilImg_rgb_2 = Image.fromarray(np.uint8(data_gray))
pilImg_rgb_2 = pilImg_rgb_2.convert("RGB")
data_rgb_2 = np.array(pilImg_rgb_2, dtype='int64') 

# 再度rgbに変換したので、(height, width, 3)という配列になる
print(type(data_rgb_2))
print("data_rgb ... " + str(data_rgb_2.shape))

ということで、(height, width) ⇔　(height, width, 3)をする場合は、こうすればよいという例でした。
(height, width, 1)　ではなく　(height, width) という配列になります。

画像を読み取って、(1, height, width, 画像のチャネル数)に変換する

追記です。書き方が悪かったです。結局、下のコードだけで十分だと思います。

from PIL import Image
import numpy as np


file = "neko.png"
image = Image.open(file)
image = image.convert("RGB")
data_rgb = np.array(image, dtype='int64')          


# rgbなので、(height, width, 3)という配列になる
print(type(data_rgb))
print("data_rgb ... " + str(data_rgb.shape))

pilImg_rgb = Image.fromarray(np.uint8(data_rgb))
pilImg_gray = pilImg_rgb.convert("L")
data_gray = np.array(pilImg_gray, dtype='int64') 

# グレースケールなので、(height, width)という配列になる
print(type(data_gray))
print("data_gray ... " + str(data_gray.shape))

# グレースケールなので、(height, width)という配列になる
print(type(data_gray))
print("data_gray ... " + str(data_gray.shape))

a = data_gray.reshape(1, image.height, image.width, 1)
print(a.shape)

# 実行結果
# <class 'numpy.ndarray'>
# data_rgb ... (210, 160, 3)
# <class 'numpy.ndarray'>
# data_gray ... (210, 160)
# (1, 210, 160, 1)

(1, 210, 160, 1)の配列になっているので、(1,32,32,1)と同じです。これで、機械学習の予測のときに使えます。ただ、普通はカラーの画像を使うと思われるので、最後が1ではなく3になります。文字などを学習させたい場合はグレースケールでも良いので、この記事のサンプルが使えるかと思います。

追記)これで自分はうまく出来て動作しています。

(1, 32, 32, 3)の配列を(32, 32, 3)に変換する

print("img ... " + str(img.shape))
# img ... (1, 32, 32, 3)
print("img ..." + str(img[0].shape))
# img ... (32, 32, 3)

imwrite(img_path, img)
# ↑これだとエラー
imwrite(img_path, img[0])
# ↑これなら成功

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up