More than 5 years have passed since last update.

numpyのarrayをpandasのDataFrameにした時に、リストのまま入れる方法

Last updated at 2019-10-04Posted at 2019-09-27

どういう事？

array = np.array([[1, 2, 3], [4, 5, 6]])

このnumpyの配列をpandasのDataFrameにすると・・・

理想

index	0
0	[1, 2, 3]
1	[4, 5, 6]

現実

index	0	1	2
0	1	2	3
1	4	5	6

PandasのDataFrameを挟んだ後、SparkのDataFrameにして、udfで処理したい時にこう言った理想が生まれます。
この理想を叶えたい時には・・・

サンプルコード

import numpy as np
import pandas as pd

array = np.array([[1, 2, 3], [4, 5, 6]])

df = pd.DataFrame(array)
df['list'] = df.apply(lambda x:x.tolist(), axis=1)
df = df.loc[:, ['list']]

print(df)

出力

index	list
0	[1, 2, 3]
1	[4, 5, 6]

10/4 追記

pd.Series([*array]).to_frame()

でも出来るとのコメントをいただきました。
こっちのほうが簡潔でいいですね・・・（僕はto_frame()という関数を初めて知りました）

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up