More than 5 years have passed since last update.

ndarrayを1次元→2次元にする際のshapeの判別方法

Last updated at 2017-11-27Posted at 2017-11-26

何がしたいか

以下のようなファイルを読み込んで二次元強度マップを描く¹ことを想定します。

data.txt

そのためには、以下のようにx, y, z それぞれの列を読み込み、2D ndarray にする必要があります。

import numpy as np

x, y, z = np.loadtxt("data.txt", unpack=True)
X = x.reshape(2, 3)
Y = y.reshape(2, 3)
Z = z.reshape(2, 3)

この程度の量のデータであればreshape する際に指定すべきshape (2D ndarray の縦横サイズ)は一目でわかりますが、データ点数が大きいとそうは言っていられません。何か良い判別方法は無いでしょうか？

方法

np.unique を使えば、簡単にshape が判別できます。np.unique は、重複要素を除いたndarray を返してくれます。

In[1]

np.unique(x)

Out[1]

array([1, 2])

すなわち、x, y にnp.unique を入力して返ってきたndarray の要素数が、2D-ndarray にした時の縦横サイズになるわけです。

メソッドにまとめると以下のようになります。

def load_2dmap_data(filename, skip_header=0, delimiter=None, x_row=0, y_row=1, z_row=2):
    data = np.genfromtxt(filename, delimiter=delimiter, skip_header=skip_header)
    
    xnum = len(np.unique(data[:, x_row]))
    ynum = len(np.unique(data[:, y_row]))
            
    x = data[:, x_row].reshape(xnum, ynum)
    y = data[:, y_row].reshape(xnum, ynum)
    z = data[:, z_row].reshape(xnum, ynum)
    
    return x, y, z

使ってみましょう。

In[2]

X, Y, Z = load_2dmap_data("data.txt")
print X
print Y
print Z

Out[2]

[[ 1.  1.  1.]
 [ 2.  2.  2.]]
[[ 1.  2.  3.]
 [ 1.  2.  3.]]
[[  10.   60.   50.]
 [  30.   20.  100.]]

想定通り、きちんと変形できていますね。

ただし、この処理は元データのx, y の順序がきれいに並んでいることが前提です。なので、x, y, z の対応は合っているが、x, y の並び方がランダムになっているようなデータには適用できないので注意が必要です。（そんなデータは処理したくないが、うまくソートすれば出来るかも？）

(2017/11/27 追記)
コメントで教えていただきましたが、pandas を使うと順番の狂ったデータも簡単に読み込めるようです。

こんな感じのデータを用意します。

data2.txt

# X Y Z
-2 -2 -0.832
-2 -1 0.124
-1 0 1.540
-1 1 1.081
-1 2 0.124
0 -2 0.584
0 -1 1.540
2 1 0.124
2 2 -0.832
-2 0 0.584
-2 1 0.124
-2 2 -0.832
-1 -2 0.124
-1 -1 1.081
1 -2 0.124
1 -1 1.081
1 0 1.540
1 1 1.081
1 2 0.124
2 -2 -0.832
2 -1 0.124
2 0 0.584
0 0 2.000
0 1 1.540
0 2 0.584

規則的に並べて作ったデータから、わざと行を入れ替えてX, Y の順を不規則にしています。

pandas.read_table で読み込み、pivot_table で変形します。

In[3]

df = pd.read_table('./data08.txt', sep=' ', escapechar='#')
df

Out[3]

	X	Y	Z
0	-2	-2	-0.832
1	-2	-1	0.124
2	-1	0	1.540
3	-1	1	1.081
4	-1	2	0.124
5	0	-2	0.584
6	0	-1	1.540
7	2	1	0.124
8	2	2	-0.832
9	-2	0	0.584
10	-2	1	0.124
11	-2	2	-0.832
12	-1	-2	0.124
13	-1	-1	1.081
14	1	-2	0.124
15	1	-1	1.081
16	1	0	1.540
17	1	1	1.081
18	1	2	0.124
19	2	-2	-0.832
20	2	-1	0.124
21	2	0	0.584
22	0	0	2.000
23	0	1	1.540
24	0	2	0.584

In[4]

df_pivot = pd.pivot_table(data=df, values='Z', columns='X', index='Y', aggfunc=np.mean)
df_pivot

Out[4]

X	-2	-1	0	1	2
Y					
-2	-0.832	0.124	0.584	0.124	-0.832
-1	0.124	1.081	1.540	1.081	0.124
0	0.584	1.540	2.000	1.540	0.584
1	0.124	1.081	1.540	1.081	0.124
2	-0.832	0.124	0.584	0.124	-0.832

すごいですね、きれいに並んでます。このデータの可視化にはmatplotlib のpcolor, pcolormesh 等が使えるようです。本来（？）なら2d ndarray を入力とするメソッドですが、pandas のピボットテーブルをそのまま渡したところ、強度マップが描画できました。

In[5]

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(3, 3))
ax.pcolormesh(df_pivot)
plt.show()

https://qiita.com/inashiro/items/c59e31b0f0557a7a8bca のような感じ。 ↩

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up