@Kenta-Kposted at 2021-08-28

Pandasで作成したデータフレームに関数で求めたある値についての列を追加する

Q&A

Closed

pandas BMI データフレームに列追加

解決したいこと

Pandasで、作成した身長体重に関するデータフレームdfがあるとします。
また、Pythonの関数で作成したBMI計算を行う関数 def calcBMI(Height,Weight)があるとします。

いま、このcalcBMI()でBMIを求めたとしてこのBMIをdfに列追加したいのですが
その方法がわかりません。

やったこと

元のデータフレーム

df = pd.read_csv('500_Person_Gender_Height_Weight_Index.csv', header = 0)

BMIを求める新しいデータフレームを作成

df_calcbmi = df[["Height","Weight"]]

BMI計算

def calcbmi(Height,Weight):
return Weight / ((Height/100) ** 2)

作成したデータフレームを元のデータフレームに合体

df2 = pd.concat([df,df_calcbmi],index=["BMI"],axis=1)
df2.columns=["Gender","Height","Weight","Index","BMI"] #列見出し
df.append(df2)
df.head() # 先頭5行表示

問題

TypeError Traceback (most recent call last)
in
7 return Weight / ((Height/100) ** 2)
8 # 作成したデータフレームを元のデータフレームに合体
----> 9 df2 = pd.concat([df,df_calcbmi],index=["BMI"],axis=1)
10 df2.columns=["Gender","Height","Weight","Index","BMI"] #列見出し
11 df.append(df2)

~\anaconda3\lib\site-packages\pandas\util_decorators.py in wrapper(*args, **kwargs)
309 stacklevel=stacklevel,
310 )
--> 311 return func(*args, **kwargs)
312
313 return wrapper

TypeError: concat() got an unexpected keyword argument 'index'

0 likes

2Answer

@mamo3gr posted at 2021-08-28

pandas.DataFrame.applyを使うのはどうでしょうか。

>>> import pandas as pd
>>> df = pd.DataFrame({'Height': [170, 150], 'Weight': [63, 48]})
>>> df
   Height  Weight
0     170      63
1     150      48
>>> def calcbmi(height, weight):
... 	return weight / ((height / 100) ** 2)
...
>>> df['BMI'] = df.apply(lambda row: calcbmi(row['Height'], row['Weight']), axis='columns')
>>> df
   Height  Weight        BMI
0     170      63  21.799308
1     150      48  21.333333

0Like

Comments

@Kenta-K
Questioner
ありがとうございます

@simonritchie posted at 2021-08-28

以前教えていただいたのですが、数千万行といったレベルの行数にならないならzipとループで回してしまうのが速度的にも記述のシンプルさ的にもおすすめです！

参考 :

以下サンプルコードです！

import pandas as pd

# 仮データなのでread_csvなどで書き換えをお願いします。
df = pd.DataFrame(
    data=[{
        'Height': 172,
        'Weight': 65,
    }, {
        'Height': 163,
        'Weight': 53,
    }])

def calcbmi(Height, Weight):
    return Weight / ((Height/100) ** 2)

height_list = df['Height'].tolist()
weight_list = df['Weight'].tolist()
bmi_list = []

for height, weight in zip(height_list, weight_list):
    bmi = calcbmi(Height=height, Weight=weight)
    bmi_list.append(bmi)

df['bmi'] = bmi_list
print(df)

print結果 :

   Height  Weight        bmi
0     172      65  21.971336
1     163      53  19.948060

ポイントは、

データフレームのtolistメソッドで特定行のリストが作れます（df['Height'].tolist()などの部分）。
zipで複数のリストに対して1行ずつループを回すことができます（for height, weight in zip(height_list, weight_list):部分）。
bmi_list.append(bmi)部分で計算したbmiの値をリストに追加しています。ループが終わるとデータフレームと同じ行数のリストができます。
df['bmi'] = bmi_listといったように、右辺にデータフレームの行数と同じ値の数のリストを指定すると新しい列を設定できます。

ループで回す以外には、applyメソッドなどで関数をデータフレームに反映する方法もあります！