More than 5 years have passed since last update.

Pandasについて

Last updated at 2018-08-28Posted at 2018-08-28

Pandasを使ってみましょう。

import pandas as pd

ファイルを開く

df_white = pd.read_csv("./winequality-white.csv", sep=";")
df_red = pd.read_csv("./winequality-red.csv", sep=";")
df_white
df_red

それぞれのデータフレームにtypeという列名で列を追加せよ
type列は白ワインは0、赤ワインは1を埋めよ

df_white["type"] = 0
df_red["type"] = 1

df_white
df_red.head()

df_whiteとdf_redを行を追加する形でdfと言う名前の１つのデータフレームにせよまた、１つに結合した際にインデックスはリセットせよ

df = df_white.append(df_red).reset_index(drop=True)
df

欠損値がないかの確認

df.isnull()

特徴量側での欠損値の確認

df.isnull().any()

qualityごとのレコード数を確認せよ

df.groupby("quality").count()

tmp_set = set(df["quality"])
tmp_dict = {}
for s in tmp_set:
    cnt = sum(df["quality"] == s)
    tmp_dict[s] = cnt
print(tmp_dict)

df["quality"] == 3

qualityとtypeごとのレコード数を確認せよ

df.groupby(["quality", "type"]).count()

matplotlibのhistを用い、typeごとのpHの分布をヒストグラムを描き確認せよ

%matplotlib inline
import matplotlib.pyplot as plt
plt.hist([df_white["density"], df_red["density"]], label=["white", "red"])
# plt.hist(df_white["pH"], label="white", rwidth=0.4)
# plt.hist(df_red["pH"], label="red", rwidth=0.4)
plt.legend()
plt.show()

typeごと、pHを0.1単位ごとにレコード数がいくつあるか確認せよ

import numpy as np
df["round_pH"] =  np.round(df["pH"], 1)
# df["round_pH"] =  df["pH"].apply(lambda x: round(x, 1))
df.groupby(["type", "round_pH"]).count()

本日はここまで！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up