More than 1 year has passed since last update.

Pandasチートシート

Last updated at 2024-01-15Posted at 2024-01-15

仕事でpandasを使うプロジェクトにアサインしたので
勉強がてらまとめてみよう。

参考文献

サンプルデータ

基本操作

csv読み込み: read_csv()

master = pd.read_csv(csv_file)

DataFrame型

先頭5行表示: head()

master = pd.read_csv(csv_file).head()

件数取得: len()

len(master)

ユニオン結合: concat()

tra_1 = pd.read_csv(csv_file1)
tra_2 = pd.read_csv(csv_file2)

transaction = pd.concat(
    [tra_1, tra_2],
    ignore_index = True
)

データ数を行方向に増やす(縦に結合する)

ignore_index = True

ignore_index = False

ジョイン結合: merge()

 join_data = pd.merge(
        base_pd, 
        add_pd[["transaction_id", "payment_date", "customer_id"]],
        on="transaction_id", 
        how="left"
    )

第一引数 ベースとなるDataFrame
第二引数 どのテーブル から どのカラム を追加するか
第三引数 キー
第四引数 どこに追加するか
※ここの関数での第n引数です

横にデータを増やす

joinするときの考え方
①足りない(付加したい)データ列は何か？
②共通するデータ列は何か？

カラム合計: sum()

join_data["price"].sum()

nullチェック: isnull()

join_data.isnull().sum()

欠けているデータ数をカラム毎に数える

数値集計: describe()

join_data.describe()

数値データの集計をしてくれる

	describe
count	データ件数
mean	平均値
std	標準偏差
min	最小値
25%	四分位数(25%)
50%	中央値(50%)
75%	四分位数(75%)
max	最大値

データ型: dtypes

join_data.dtypes

行の参照: iloc

df.iloc[10]

特定のセルの参照: at

data.at[3, "transaction_id"]

カラム名指定

特定のセルの参照: iat

data.iat[3, 5]

インデックス指定

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up