0
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

データ前処理

Posted at

データの前処理に関するコードメモ

#データフレーム内の欠損値数の確認
df_train.isnull().sum()
 ages                  18
list_price             0
num_reviews            0
piece_count            0
review_difficulty      0
star_rating          405
country              255
dtype: int64

#null値を持つ行を削除
df = df.dropna(subset=['charges'])

#欠損値の補完(平均値or最頻値を補完)
df = df.fillna({'bmi': df['bmi'].mean()})

#カテゴリカル変数の確認
df_obj = df.select_dtypes(include='object')
gender	smoker	region
0	female	yes	southwest
3	male	no	northwest
9	female	no	northwest

df_uni = df_obj.nunique()
gender    2
smoker    2
region    4
dtype: int64

for uni in df_obj.columns:
    print(uni)
    print(df_obj[uni].unique())

gender
['female' 'male']
smoker
['yes' 'no']
region
['southwest' 'northwest' 'southeast' 'northeast']

#カテゴリカル変数をダミー変数へ
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
le.fit(df['gender'])
le.transform(df['gender'])

age	gender	bmi	children	smoker	region	charges
0	19	0	27.900000	0	yes	southwest	16884.92400
3	33	1	22.705000	0	no	northwest	21984.47061
9	60	0	30.716434	0	no	northwest	28923.13692

#One-Hot Encording (カテゴリカル変数をすべてダミー変数へ)
df = pd.get_dummies(df, drop_first=True)

One-Hot Encordingがどちゃくそ便利!

0
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?