1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

pandasでpivot_tableしつつグループごとにdescribeで統計量を見たい

Posted at

はじめに

pandasのdescribe()はとっても便利だけど、groupbyとかpivot_tableとかで、任意のグループごとにdescribe()して統計量をパパっと見ることはできないものか?

pandasのdescribeとは?

データ数、平均、標準偏差、四分位点をさっと調べてくれます

import pandas as pd
import seaborn as sns
df=sns.load_dataset("iris")
df.describe()

#sepal_length	sepal_width	petal_length	petal_width
#count	150.000000	150.000000	150.000000	150.000000
#mean	5.843333	3.057333	3.758000	1.199333
#std	0.828066	0.435866	1.765298	0.762238
#min	4.300000	2.000000	1.000000	0.100000
#25%	5.100000	2.800000	1.600000	0.300000
#50%	5.800000	3.000000	4.350000	1.300000
#75%	6.400000	3.300000	5.100000	1.800000
#max	7.900000	4.400000	6.900000	2.500000

集計関数としても"describe"は使える

groupbyだと、.count().sum()のように使えます。
※横に長いので.Tで転置して表示

df.groupby("species").describe().T
#	species	setosa	versicolor	virginica
#sepal_length	count	50.000000	50.000000	50.000000
#mean	5.006000	5.936000	6.588000
#std	0.352490	0.516171	0.635880
#min	4.300000	4.900000	4.900000
#25%	4.800000	5.600000	6.225000
#50%	5.000000	5.900000	6.500000
#75%	5.200000	6.300000	6.900000
#max	5.800000	7.000000	7.900000
#sepal_width	count	50.000000	50.000000	50.000000
#mean	3.428000	2.770000	2.974000
#std	0.379064	0.313798	0.322497
#(以下略)

pivot_tableでも同様に使えます

df.pivot_table(index="species",aggfunc="describe").T

#	species	setosa	versicolor	virginica
#sepal_length	count	50.000000	50.000000	50.000000
#mean	5.006000	5.936000	6.588000
#std	0.352490	0.516171	0.635880
#min	4.300000	4.900000	4.900000
#(以下略)

別解

どうしてもこれだけじゃ満足できない人は別途関数をdefで定義するか、もしくはlambda関数でサクッといっちゃえばよし

# %%
pv=df.pivot_table(
    index="index",
    values="values",
    aggfunc=[
        "mean",
        "std",
        "min",
        lambda x: np.percentile(x, 25),
        "median",
        lambda x: np.percentile(x, 75),
        "max",
        "count"
    ]
)
col=[["mean","std","min","25%","median","75%","max","count"],["values"]]
pv.columns=pd.MultiIndex.from_product(col)
pv

おわりに

いままでふつうに別解してた…
ふと出来心でやってみたらできてしまって、うれしいやら悲しいやら…

1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?