2
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

DataFrameでgroupbyしてmode(最頻値)を出す

Last updated at Posted at 2022-12-22

発端:平均はパッと出せる

pandasのDataFrameって本当に便利.項目ごとの平均値も簡単に出せる.

sample
df = pd.DataFrame({
    'color': ['red', 'red', 'blue', 'blue', 'blue', 'blue'],
    'type': ['A', 'B', 'B', 'C', 'C', 'C'],
    'quantity': [10, 15, 12, 10, 10, 20]
})
color type quantity
0 red A 10
1 red B 15
2 blue B 12
3 blue C 10
4 blue C 10
5 blue C 20

簡単だ〜.

mean
df.groupby('color').mean()
color quantity
blue 13
red 12.5

meanみたくやってみる

最頻値もパッと出したい.
pandasのissue (Groupby.mode() - feature request)でvalue_counts()でやれって言われてる.でも最頻値だけパッとほしい時もあるじゃん.
最頻値はこう?

mode
df.groupby('color').mode()

AttributeError出た.
こうか?

mode
df.groupby('color').agg('mode')

AttributeError...

結論

lambda使わないとできない.でも,同じ出現数の値がある場合ちゃんとどちらも出してくれる.

mode
df.groupby('color').quantity.apply(lambda x: x.mode())
quantity
('blue', 0) 10
('red', 0) 10
('red', 1) 15

感覚的にmeanと同じように出力して欲しいけど,出現数が同じ値があるときの処理がうまく決まらないのかな

参考サイト: 'mode' not recognized by df.groupby().agg(), but pd.Series.mode works #11562

2
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?