Pythonでの移動平均算出方法の比較

Python

Last updated at 2024-05-27Posted at 2024-05-26

以下の四つの方法で処理速度を比較

#1. numpy.convolve 使用
#2. numpyのsliding_window_view 使用
#3. pandasのrolling 使用
#4. numpy.cumsum 使用

from time import time
import numpy as np
from numpy.lib.stride_tricks import sliding_window_view
import pandas as pd

def moving_average_1(data, window_size):
    weights = np.ones(window_size) / window_size
    moving_avg = np.convolve(data, weights, mode='valid')
    return moving_avg

def moving_average_2(data, window_size):
    windows = sliding_window_view(data, window_size)
    moving_avg = np.mean(windows, axis=1)
    return moving_avg

def moving_average_3(data, window_size):
    data = pd.Series(data)
    moving_avg = data.rolling(window_size).mean()
    moving_avg = moving_avg.to_numpy()[window_size - 1:]
    return moving_avg

def moving_average_4(data, window_size):
    cumsum = np.cumsum(data, dtype=np.float64)
    cumsum = np.insert(cumsum, 0, 0)
    moving_avg = (cumsum[window_size:]-cumsum[:-window_size]) / window_size
    return moving_avg

data = np.arange(300000)
window_size = 30000

"""1.numpy.convolve 使用"""
t = time()
result = moving_average_1(data, window_size)
print(result, len(result))
print(time() - t, "秒")
# 結果1
# [ 14999.5  15000.5  15001.5 ... 284997.5 284998.5 284999.5] 270001
# 6.458694696426392 秒

"""2.numpyのsliding_window_view 使用"""
t = time()
result = moving_average_2(data, window_size)
print(result, len(result))
print(time() - t, "秒")
# 結果2
# [ 14999.5  15000.5  15001.5 ... 284997.5 284998.5 284999.5] 270001
# 4.69207239151001 秒

"""3.pandasのrolling 使用"""
t = time()
result = moving_average_3(data, window_size)
print(result, len(result))
print(time() - t, "秒")
# 結果3
# [ 14999.5  15000.5  15001.5 ... 284997.5 284998.5 284999.5] 270001
# 0.014004230499267578 秒

"""4.numpy.cumsum 使用"""
t = time()
result = moving_average_4(data, window_size)
print(result, len(result))
print(time() - t, "秒")
# 結果4
# [ 14999.5  15000.5  15001.5 ... 284997.5 284998.5 284999.5] 270001
# 0.003000020980834961 秒

今回のデータサイズ（データ長さ30万、ウィンドウ3万）の場合、
「3. pandasのrolling 使用」と「4. numpy.cumsum 使用」では
処理速度に問題ありませんでしたが、
「1. numpy.convolve 使用」と「2. numpyのsliding_window_view 使用」では
遅いと感じる速度でした。

データサイズ1000万でウィンドウ300万などにすると、#1や#2では私のPCでは
いくら待っても結果が出なさそうでした。
一方、#3や#4は処理時間が0.2秒や0.1秒に伸びますが問題なく動作しました。

音声データのような大量のデータ点数がある場合にはこのような方法がよいでしょう。

なお、#4の場合、「dtype=np.float64」などにしないと結果がおかしくなりました。

# 結果4(dtype指定なしの場合)
# [14999.5        15000.5        15001.5        ... -1333.65306667
#  -1332.65306667 -1331.65306667] 270001
# 0.0020008087158203125 秒

問題あれば教えてください。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up