More than 1 year has passed since last update.

[pandas] Series, DataFrameの統計量を得る

Last updated at 2023-03-23Posted at 2022-12-30

Series, DataFrameの統計量の求め方をまとめてみた。

公式のdocumentation

User Guide -- pandas 1.4.4 documentation

動作環境

種類	バージョン
MacBook Air	Ventura13.0.1
python	3.9.6
jupyter notebook	6.5.2
pandas	1.5.2

まずはパッケージのインポートから

import pandas as pd

pandasを扱うときはpdが慣例だそう。

今回扱うDataFrameについて

[python]pandas read_csvの備忘録
にて作成したDataFrameで説明している。

print(df_SN.head(10))
#      year   num  std  spot  certenty
# 0  1700.5   8.3 -1.0    -1         1
# 1  1701.5  18.3 -1.0    -1         1
# 2  1702.5  26.7 -1.0    -1         1
# 3  1703.5  38.3 -1.0    -1         1
# 4  1704.5  60.0 -1.0    -1         1
# 5  1705.5  96.7 -1.0    -1         1
# 6  1706.5  48.3 -1.0    -1         1
# 7  1707.5  33.3 -1.0    -1         1
# 8  1708.5  16.7 -1.0    -1         1
# 9  1709.5  13.3 -1.0    -1         1

順に、
year:観測年、num:黒点数、std:標準偏差、obs spot:観測地点数、certainty:確定or未確定
以前の記事のヘッダーから英語にしているとこだけご注意。
また、以降の操作では一貫して"num"の項(全球黒点数)をSeriesの形で用いているが、DataFrameでも同様の処理が可能(後述)

平均

Seriesの平均(mean)は、Seriesの末尾に.mean()をつける。

print(df_SN['num'].mean())
# 78.36521739130436

分散

Seriesの分散(variation)は、Seriesの末尾に.var()をつける。

print(df_SN['num'].var())
# 3850.7780387376397

標準偏差

Seriesの標準偏差(standard deviation)は、Seriesの末尾に.std()をつける。

print(df_SN['num'].std())
# 62.054637528049746

最大値、最小値

Seriesの最大値(maximum)、最小値(minimum)は、Seriesの末尾に.max()、.min()をつける。

print(df_SN['num'].max())
print(df_SN['num'].min())
# 269.3
#   0.0

要約統計量の表示

末尾に.describe()を付けると、統計量をまとめて表示できる。

print(df_SN['num'].describe())
# count    322.000000
# mean      78.365217
# std       62.054638
# min        0.000000
# 25%       24.325000
# 50%       65.150000
# 75%      115.075000
# max      269.300000
# Name: num, dtype: float64

上から、count:要素数、mean:平均数、std:標準偏差、min:最小値、25%:1/4分位数、50%:中央値、75%:3/4分位数、max:最大値となっている。

DataFrameでの処理

DataFrameに同様の処理をすると以下のようになる。

print(df_SN.describe())
#               year         num         std      obs spot  certainty
# count   322.000000  322.000000  322.000000    322.000000      322.0
# mean   1861.000000   78.365217    4.633540   1113.546584        1.0
# std      93.097619   62.054638    5.274608   2574.732653        0.0
# min    1700.500000    0.000000   -1.000000     -1.000000        1.0
# 25%    1780.750000   24.325000   -1.000000     -1.000000        1.0
# 50%    1861.000000   65.150000    4.150000    365.000000        1.0
# 75%    1941.250000  115.075000    8.850000    365.000000        1.0
# max    2021.500000  269.300000   19.100000  15233.000000        1.0

見た通り、各column毎に統計量を表示してくれる。
describeメソッドの詳しい使い方については...また後日...

参考にしたURL

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up