はじめに
初心者ながらデータ分析をする機会を得たため、
そこで新しく得たPythonのDataFrameの文法的要素をまとめる。
前提
product.csv
id | name | price | category | isPopular |
---|---|---|---|---|
1 | eraser | 100 | stationary | 1 |
2 | pencil | 200 | stationary | 0 |
3 | socks | 400 | clothes | 1 |
4 | pants | 1000 | clothes | 0 |
5 | apple | 100 | food | 0 |
analyze.py
import pandas as pd
あるカラムの値の種類を抽出する
df['category'].value_counts().index
実行結果
Index(['stationery', 'clothes', 'food'], dtype='object')
条件を指定して、DataFrameの値を変更・追加する
df.loc[df.name == 'socks', 'price'] = 500
df.loc[df.category == 'stationery', 'category_id'] = 0
df.loc[df.category == 'clothes', 'category_id'] = 1
df.loc[df.category == 'food', 'category_id'] = 2
df
実行結果
id | name | price | category | isPopular | category_id |
---|---|---|---|---|---|
1 | eraser | 100 | stationary | 1 | 0.0 |
2 | pencil | 200 | stationary | 0 | 0.0 |
3 | socks | 500 | clothes | 1 | 1.0 |
4 | pants | 1000 | clothes | 0 | 1.0 |
5 | apple | 100 | food | 0 | 2.0 |
one-hot表現に変更
# columnのisPopularとcategory_idのみを抽出(整数値でないとうまくいかない)
df_X = df.drop(['id','name','price','category'], axis=1)
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder()
enc.fit(df_X)
onehot_array = enc.transform(df_X).toarray()
onehot_df = pd.DataFrame(onehot_array)
df = pd.concat([df_id, onehot_df], axis=1)
df
実行結果
id | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
1 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 |
2 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 |
3 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 |
4 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 |
5 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 |