More than 3 years have passed since last update.

pd.SeriesでもGroupby集計はできる

Last updated at 2022-03-16Posted at 2022-03-16

TL;DR;

グループ化用のリストgroup_listを用意する。
pd.Series.groupby(group_list)で集計できる。

例

1年分の売り上げデータ(乱数)を生成して、月毎に合計を計算する。

売り上げの生成

>>> daily_sales = pd.Series(
    np.random.randint(0, 100, 365),
    index=pd.date_range("2021-01-01", periods=365)
)

>>> daily_sales
"""
2021-01-01     3
2021-01-02    71
2021-01-03    40
2021-01-04    80
2021-01-05     6
              ..
2021-12-27    78
2021-12-28    90
2021-12-29    86
2021-12-30    30
2021-12-31    46
Freq: D, Length: 365, dtype: int64
"""

Groupby用のリストを作成。月毎にまとめる。

>>> daily_sales_ym = ["{:%Y%m}".format(ymd) for ymd in daily_sales.index]

>>> daily_sales_ym
"""
['202101',
 '202101',
 '202101',
 '202101',
 '202101',
    :
 '202112',
 '202112',
 '202112',
 '202112']
"""

Groupbyで集計する

>>> daily_sales.groupby(by=daily_sales_ym).sum()
"""
202101    1631
202102    1320
202103    1710
202104    1405
202105    1634
202106    1788
202107    1484
202108    1539
202109    1573
202110    1349
202111    1653
202112    1277
dtype: int64
"""

よかった。いちいちreset_index()呼び出して日付の列入っているDataFrame作らないといけないと思った。Seriesでもできるんだ。

参考

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up