More than 1 year has passed since last update.

Pandas: データフレームについて--06: 統計量の計算

Last updated at 2022-06-22Posted at 2022-06-20

統計量の計算

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'a': [1, 2, 3, 4, 5],
    'b': [1.2, 3.4, 5.6, 7.8, np.nan]
})
df

	a	b
0	1	1.2
1	2	3.4
2	3	5.6
3	4	7.8
4	5	NaN

1.　主要記述統計量の算出 `describe`

describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)

NaN を除いた¹有効サンプルサイズ，平均値，標準偏差，最小値，第 1 四分位数，中央値，第 3 四分位数，最大値を求める。

df.describe()

	a	b
count	5.000000	4.000000
mean	3.000000	4.500000
std	1.581139	2.840188
min	1.000000	1.200000
25%	2.000000	2.850000
50%	3.000000	4.500000
75%	4.000000	6.150000
max	5.000000	7.800000

1.1.　個別の記述統計量の算出

print(df.count()) # サンプルサイズ（標本の大きさ）

a    5
b    4
dtype: int64

print(df.mean()) # 平均値

a    3.0
b    4.5
dtype: float64

print(df.std()) # （不偏）標準偏差 ddof=1

a    1.581139
b    2.840188
dtype: float64

print(df.std(ddof=0)) # （不偏でない）標準偏差

a    1.414214
b    2.459675
dtype: float64

print(df.min()) # 最小値

a    1.0
b    1.2
dtype: float64

パーセンタイル値を求める

quantile(q=0.5, axis: 'Axis' = 0, numeric_only: 'bool' = True,
         interpolation: 'str' = 'linear')

print(df.quantile(0.25)) # 第 1 四分位数

a    2.00
b    2.85
Name: 0.25, dtype: float64

print(df.quantile(0.50)) # 中央値（第 2 四分位数）

a    3.0
b    4.5
Name: 0.5, dtype: float64

print(df.quantile(0.75)) # 第 3 四分位数

a    4.00
b    6.15
Name: 0.75, dtype: float64

print(df.quantile([0, 0.25, 0.5, 0.75, 1])) # リストで複数の指定

        a     b
0.00  1.0  1.20
0.25  2.0  2.85
0.50  3.0  4.50
0.75  4.0  6.15
1.00  5.0  7.80

print(df.max()) # 最大値

a    5.0
b    7.8
dtype: float64

print(df.var()) # 不偏分散 ddof=1

a    2.500000
b    8.066667
dtype: float64

print(df.var(ddof=0)) # （不偏でない）分散 ddof=0

a    2.500000
b    8.066667
dtype: float64

print(df.median()) # 中央値（第 2 四分位数）

a    3.0
b    4.5
dtype: float64

2.　任意の記述統計量の算出 `aggregate`

aggregate(func=None, axis: 'Axis' = 0, *args, **kwargs) # aggregate() == agg()
agg は aggegate の別名（エイリアス）。別名 agg を使うことが推奨されている。

関数は np.xxx または 'xxx' で指定する。

文字列で指定しなくても受け付けられる関数もあるが，一律に，文字列で指定すれば問題ない。

df.agg(np.mean)

a    3.0
b    4.5
dtype: float64

df.agg('mean')

a    3.0
b    4.5
dtype: float64

複数の関数を指定する場合は，リストで指定する。

df.agg(['count', sum, min, max, 'median', 'mean', 'var', 'std'])

	a	b
count	5.000000	5.000000
sum	15.000000	18.000000
min	1.000000	1.200000
max	5.000000	7.800000
median	3.000000	4.500000
mean	3.000000	4.500000
var	2.500000	8.066667
std	1.581139	2.840188

agg の関数が引数を持つ場合の指定法。

result = df.agg('quantile', q=[0, 0.25, 0.5, 0.75, 1]) # min, q1, median, q3, max
result.index = ['min', 'q1', 'median', 'q3', 'max']
result

	a	b
min	1.0	1.20
q1	2.0	2.85
median	3.0	4.50
q3	4.0	6.15
max	5.0	7.80

3.　パーセンタイル値（クオンタイル)の算出 `quantile`

quantile(q=0.5, axis: 'Axis' = 0, numeric_only: 'bool' = True,
         interpolation: 'str' = 'linear')

df.quantile() # q=0.5

a    3.0
b    4.5
Name: 0.5, dtype: float64

df.quantile(q=[0, 0.25, 0.5, 0.75, 1])

	a	b
0.00	1.0	1.20
0.25	2.0	2.85
0.50	3.0	4.50
0.75	4.0	6.15
1.00	5.0	7.80

もっと知りたい人

help(df.describe) # Help on method describe in module pandas.core.generic
help(df.count)    # Help on method count in module pandas.core.frame
help(df.mean)     # Help on method mean in module pandas.core.generic
help(df.std)      # Help on method std in module pandas.core.generic
help(df.min)      # Help on method min in module pandas.core.generic
help(df.quantile) # Help on method quantile in module pandas.core.frame
help(df.max)      # Help on method max in module pandas.core.generic
help(df.var)      # Help on method var in module pandas.core.generic
help(df.median)   # Help on method var in module pandas.core.generic

欠損値として pd．NA もあるが，欠損値として扱われるのは np.NaN である。 ↩

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Pandas: データフレームについて--06: 統計量の計算

統計量の計算

1. 主要記述統計量の算出 describe

1.1. 個別の記述統計量の算出

2. 任意の記述統計量の算出 aggregate

3. パーセンタイル値（クオンタイル)の算出 quantile

もっと知りたい人

1.　主要記述統計量の算出 `describe`

1.1.　個別の記述統計量の算出

2.　任意の記述統計量の算出 `aggregate`

3.　パーセンタイル値（クオンタイル)の算出 `quantile`