2
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

累積した数値の集計と可視化

Posted at

以下のデータを使うことにする。

x = [i for i in range(1,11)]
# [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

この変数xに入った値の累積分布関数を作成したい。

pandasのcumsum関数を使う場合

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


x = [i for i in range(1,11)]

df = pd.DataFrame(x, columns=['x'])
df["cumsum"] = df.x.cumsum() # 累積和を追加
df["cumsum_ratio"] = df.x.cumsum()/sum(df.x) # cumsumの値になるまでの確率

これにより、dfは以下のような構造になる。(indexは表示してない)

x cumsum cumsum_ratio
1 1 0.018182
2 3 0.054545
3 6 0.109091
4 10 0.181818
5 15 0.272727
... ... ...

これを描画すれば良い。

fig, ax = plt.subplots(figsize=(4, 4))
ax.set_xlabel('Value')
ax.set_ylabel('Cumulative Frequency') 
ax.set_xlim(0,10)
ax.scatter(df.x, df.cumsum_ratio, color="blue",s=10) 
ax.plot(df.x, df.cumsum_ratio, color="blue", marker='o',markersize=1) 

aaa

scipyのstats.cumfreq関数を使う場合

こちらは、累積分布関数ではないが、以下のように使える。

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

x = [i for i in range(1,11)]

res = stats.cumfreq(x, numbins=10)
x_ = res.lowerlimit + np.linspace(0, res.binsize*res.cumcount.size, res.cumcount.size)


x_1 = np.arange(counts.size) * binsize + start 

fig, ax = plt.subplots(figsize=(4, 4))
ax.plot(x_, res.cumcount, 'ro')
ax.set_title('Cumulative histogram')
ax.set_xlim([x_.min(), x_.max()])

hogehoge

2
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?