More than 1 year has passed since last update.

pandasでpivot_tableしつつグループごとにdescribeで統計量を見たい

Posted at 2022-09-15

はじめに

pandasのdescribe()はとっても便利だけど、groupbyとかpivot_tableとかで、任意のグループごとにdescribe()して統計量をパパっと見ることはできないものか？

pandasの`describe`とは？

データ数、平均、標準偏差、四分位点をさっと調べてくれます

import pandas as pd
import seaborn as sns
df=sns.load_dataset("iris")
df.describe()

#sepal_length	sepal_width	petal_length	petal_width
#count	150.000000	150.000000	150.000000	150.000000
#mean	5.843333	3.057333	3.758000	1.199333
#std	0.828066	0.435866	1.765298	0.762238
#min	4.300000	2.000000	1.000000	0.100000
#25%	5.100000	2.800000	1.600000	0.300000
#50%	5.800000	3.000000	4.350000	1.300000
#75%	6.400000	3.300000	5.100000	1.800000
#max	7.900000	4.400000	6.900000	2.500000

集計関数としても"describe"は使える

groupbyだと、.count()や.sum()のように使えます。
※横に長いので.Tで転置して表示

df.groupby("species").describe().T
#	species	setosa	versicolor	virginica
#sepal_length	count	50.000000	50.000000	50.000000
#mean	5.006000	5.936000	6.588000
#std	0.352490	0.516171	0.635880
#min	4.300000	4.900000	4.900000
#25%	4.800000	5.600000	6.225000
#50%	5.000000	5.900000	6.500000
#75%	5.200000	6.300000	6.900000
#max	5.800000	7.000000	7.900000
#sepal_width	count	50.000000	50.000000	50.000000
#mean	3.428000	2.770000	2.974000
#std	0.379064	0.313798	0.322497
#（以下略）

pivot_tableでも同様に使えます

df.pivot_table(index="species",aggfunc="describe").T

#	species	setosa	versicolor	virginica
#sepal_length	count	50.000000	50.000000	50.000000
#mean	5.006000	5.936000	6.588000
#std	0.352490	0.516171	0.635880
#min	4.300000	4.900000	4.900000
#（以下略）

別解

どうしてもこれだけじゃ満足できない人は別途関数をdefで定義するか、もしくはlambda関数でサクッといっちゃえばよし

# %%
pv=df.pivot_table(
    index="index",
    values="values",
    aggfunc=[
        "mean",
        "std",
        "min",
        lambda x: np.percentile(x, 25),
        "median",
        lambda x: np.percentile(x, 75),
        "max",
        "count"
    ]
)
col=[["mean","std","min","25%","median","75%","max","count"],["values"]]
pv.columns=pd.MultiIndex.from_product(col)
pv

おわりに

いままでふつうに別解してた…
ふと出来心でやってみたらできてしまって、うれしいやら悲しいやら…

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

pandasでpivot_tableしつつグループごとにdescribeで統計量を見たい

はじめに

pandasのdescribeとは？

集計関数としても"describe"は使える

別解

おわりに

pandasの`describe`とは？