More than 1 year has passed since last update.

時系列データの並列フィルタリングの最適化

Last updated at 2023-11-04Posted at 2023-11-04

下記は、ChatGPTと散々やり取りをした後「このチャットでの一連のやりとりを整理してまとめてください。Qiitaの記事として公開できるよう、markdown形式で出力してください。」とお願いした結果の出力です。

時系列データの並列フィルタリングの最適化

時系列データの分析において、ノイズを取り除くことは非常に重要です。特に、地球科学の分野では、観測データにフィルタを適用することによって、有用なシグナルを抽出する必要があります。Pythonのnumpyやscipy.signalを用いたフィルタリング処理を効率的に行うための方法を、並列処理を交えながら紹介します。

はじめに

データセットには様々な次元数があり、例えば3次元や4次元の配列を扱う場合があります。ここでは、3次元データには単一のフィルタリング処理を、4次元データには次元ごとに並列フィルタリング処理を適用する方法について考えます。

フィルタリング関数の定義

FFTフィルタリング関数

def fft_filter(time_series, b):
    N = len(time_series)
    F = fft(time_series, n=2*N)  # 2*N でのゼロパディング
    H = fft(b, n=2*N)
    return np.real(ifft(F * H)[:N])

ランツォスフィルタ

def lanczos_filter(time_series, delta_t, f_cut, numtaps):
    b = signal.firwin(numtaps, f_cut, window='lanczos', fs=1/delta_t, pass_zero=True)
    return fft_filter(time_series, b)

バターワースフィルタ

def butterworth_filter(time_series, delta_t, f_cut, order):
    b, a = signal.butter(order, f_cut, btype='low', fs=1/delta_t)
    return signal.filtfilt(b, a, time_series)

並列処理を用いたフィルタリング

FFTフィルタリング関数

from numpy.fft import fft, ifft

def fft_filter(time_series, b):
    N = len(time_series)
    pad_to = max(len(b), N)
    F = fft(time_series, n=pad_to)  # FFTのサイズを調整
    H = fft(b, n=pad_to)
    result = ifft(F * H)
    return np.real(result[:N])  # 元のサイズにトリミング

並列処理を用いたフィルタリング関数の最適化

from concurrent.futures import ThreadPoolExecutor
import numpy as np

def batch_filter(start, end, value, filter_func, *args, **kwargs):
    output_array = np.zeros_like(value[start:end], dtype=np.float32)
    for i in range(end - start):
        output_array[i] = filter_func(value[start+i], *args, **kwargs)
    return output_array

def filtering(timeseries_array, filter_func, *args, batch_size=100, **kwargs):
    # バッチ処理を用いてフィルタリングを並列実行
    with ThreadPoolExecutor() as executor:
        futures = []
        for start in range(0, len(timeseries_array), batch_size):
            end = start + batch_size
            if end > len(timeseries_array):
                end = len(timeseries_array)
            futures.append(executor.submit(batch_filter, start, end, timeseries_array, filter_func, *args, **kwargs))
        result = np.concatenate([future.result() for future in futures])
    return result

まとめ

この記事で紹介した最適化された関数は、地球科学の研究者が時系列データを効率的に処理し、ノイズを除去するためのツールとして有効です。高次元のデータセットに対するフィルタリング処理を、並列処理を利用して高速化することが可能になります。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up