More than 3 years have passed since last update.

Pandasのgroupby処理で発生するマルチカラムをシングルに直す方法

Last updated at 2020-09-23Posted at 2020-04-29

概要

Pandasのgroupby処理をする際に、.agg()を利用して、[max, min]等、複数の統計量を計算すると、返ってくるデータフレームがマルチカラムになっている。このマルチカラムを簡単にシングルカラムに直す方法を紹介する。

0, 1のみから成る5行2列のデータフレームをサンプルとして作成する。

input}

import numpy as np
import pandas as pd

mat = np.random.rand(5, 2)
mat[mat > 0.5] = 1
mat[mat <= 0.5] = 0
df = pd.DataFrame(mat, columns=['A', 'B'])

output}

     A    B
0  0.0  1.0
1  1.0  0.0
2  0.0  1.0
3  0.0  1.0
4  0.0  0.0

.agg()で[min, max]を指定すると、マルチカラムになってしまう。

input}

df.groupby('A').agg({'B': [min, max]}).columns

output}

MultiIndex([('B', 'min'),
            ('B', 'max')],
           )

for文でzipを扱う時のように変数（以下の例ではlevel1, level2）を用意し、fstringを用いて文字列として結合する。

input}

[f'{level1}__{level2}' for level1, level2 in df.groupby('A').agg({'B': [min, max]}).columns]

output}

['B__min', 'B__max']

上記の他にも、以下の方法があったのでメモとして残しておきます。両方とも、droplevel()を用いて、マルチカラムを解消しています。

input}

df.groupby('A').agg({'B': [("B__min", min), ("B__max", max)]}).columns.droplevel()

output}

Index(['B__min', 'B__max'], dtype='object')

input}

df.groupby('A').agg({'B': [min, max]}).add_prefix("B__").columns.droplevel()

output}

Index(['B__min', 'B__max'], dtype='object')