More than 5 years have passed since last update.

【Python】 pandas DataFrameの操作メモ

Last updated at 2017-09-03Posted at 2017-09-03

はじめに

初心者ながらデータ分析をする機会を得たため、
そこで新しく得たPythonのDataFrameの文法的要素をまとめる。

前提

product.csv

id	name	price	category	isPopular
1	eraser	100	stationary	1
2	pencil	200	stationary	0
3	socks	400	clothes	1
4	pants	1000	clothes	0
5	apple	100	food	0

analyze.py

import pandas as pd

あるカラムの値の種類を抽出する

df['category'].value_counts().index

実行結果

Index(['stationery', 'clothes', 'food'], dtype='object')

条件を指定して、DataFrameの値を変更・追加する

df.loc[df.name == 'socks', 'price'] = 500
df.loc[df.category == 'stationery', 'category_id'] = 0
df.loc[df.category == 'clothes', 'category_id'] = 1
df.loc[df.category == 'food', 'category_id'] = 2
df

実行結果

id	name	price	category	isPopular	category_id
1	eraser	100	stationary	1	0.0
2	pencil	200	stationary	0	0.0
3	socks	500	clothes	1	1.0
4	pants	1000	clothes	0	1.0
5	apple	100	food	0	2.0

one-hot表現に変更

# columnのisPopularとcategory_idのみを抽出（整数値でないとうまくいかない）
df_X = df.drop(['id','name','price','category'], axis=1)

from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder()
enc.fit(df_X)
onehot_array = enc.transform(df_X).toarray()
onehot_df = pd.DataFrame(onehot_array)
df = pd.concat([df_id, onehot_df], axis=1)
df

実行結果

id	0	1	2	3	4
1	0.0	1.0	1.0	0.0	0.0
2	1.0	0.0	1.0	0.0	0.0
3	0.0	1.0	0.0	1.0	0.0
4	1.0	0.0	0.0	1.0	0.0
5	1.0	0.0	0.0	0.0	1.0

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

id	0	1	2	3	4
1	0.0	1.0	1.0	0.0	0.0
2	1.0	0.0	1.0	0.0	0.0
3	0.0	1.0	0.0	1.0	0.0
4	1.0	0.0	0.0	1.0	0.0
5	1.0	0.0	0.0	0.0	1.0

id	0	1	2	3	4
1	0.0	1.0	1.0	0.0	0.0
2	1.0	0.0	1.0	0.0	0.0
3	0.0	1.0	0.0	1.0	0.0
4	1.0	0.0	0.0	1.0	0.0
5	1.0	0.0	0.0	0.0	1.0

id	0	1	2	3	4
1	0.0	1.0	1.0	0.0	0.0
2	1.0	0.0	1.0	0.0	0.0
3	0.0	1.0	0.0	1.0	0.0
4	1.0	0.0	0.0	1.0	0.0
5	1.0	0.0	0.0	0.0	1.0