LoginSignup
4
9

More than 5 years have passed since last update.

【Python】 pandas DataFrameの操作メモ

Last updated at Posted at 2017-09-03

はじめに

初心者ながらデータ分析をする機会を得たため、
そこで新しく得たPythonのDataFrameの文法的要素をまとめる。

前提

product.csv

id name price category isPopular
1 eraser 100 stationary 1
2 pencil 200 stationary 0
3 socks 400 clothes 1
4 pants 1000 clothes 0
5 apple 100 food 0
analyze.py
import pandas as pd

あるカラムの値の種類を抽出する

df['category'].value_counts().index

実行結果

Index(['stationery', 'clothes', 'food'], dtype='object')

条件を指定して、DataFrameの値を変更・追加する

df.loc[df.name == 'socks', 'price'] = 500
df.loc[df.category == 'stationery', 'category_id'] = 0
df.loc[df.category == 'clothes', 'category_id'] = 1
df.loc[df.category == 'food', 'category_id'] = 2
df

実行結果

id name price category isPopular category_id
1 eraser 100 stationary 1 0.0
2 pencil 200 stationary 0 0.0
3 socks 500 clothes 1 1.0
4 pants 1000 clothes 0 1.0
5 apple 100 food 0 2.0

one-hot表現に変更

# columnのisPopularとcategory_idのみを抽出(整数値でないとうまくいかない)
df_X = df.drop(['id','name','price','category'], axis=1)

from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder()
enc.fit(df_X)
onehot_array = enc.transform(df_X).toarray()
onehot_df = pd.DataFrame(onehot_array)
df = pd.concat([df_id, onehot_df], axis=1)
df

実行結果

id 0 1 2 3 4
1 0.0 1.0 1.0 0.0 0.0
2 1.0 0.0 1.0 0.0 0.0
3 0.0 1.0 0.0 1.0 0.0
4 1.0 0.0 0.0 1.0 0.0
5 1.0 0.0 0.0 0.0 1.0
4
9
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4
9