More than 3 years have passed since last update.

tsfreshにおける特徴量の調整（追加・削除・オリジナル特徴量の追加）

Posted at 2021-10-28

はじめに

時系列データに対して機械学習手法を適用する際に、sliding windowなどでデータを小区間に区切り特徴量を抽出などすると思いますが、pythonで特徴量探索を行う場合に良く利用されるライブラリの一つにtsfreshがあります。ここでは、tsfreshについてデータから抽出する特徴量をユーザが追加・削除、また、独自に定義・実装したオリジナル特徴量を追加する方法を見ていきます。

今回使用するコードはこちらにあります。

サンプルデータのロード

初めに解説で利用するために、tsfreshで用意されているサンプルデータをダウンロードします。

from tsfresh.examples.robot_execution_failures import (
    download_robot_execution_failures,
    load_robot_execution_failures,
)

download_robot_execution_failures()
timeseries, _ = load_robot_execution_failures()
timeseries.head(5)

	id	time	F_x	F_y	F_z	T_x	T_y
0	1	0	-1	-1	63	-3	-1
1	1	1	0	0	62	-3	-1
2	1	2	-1	-1	61	-3	0
3	1	3	-1	-1	63	-2	-1
4	1	4	-1	-1	63	-3	-1

tsfreshを使った特徴量の抽出

まずdefaultの設定で特徴量を計算し、特徴量の数を見てみます。

from tsfresh.feature_extraction import extract_features

features = extract_features(
    timeseries,
    column_id="id",
    column_sort="time",
).reset_index(drop=True)

Feature Extraction: 100%|██████████| 88/88 [00:01<00:00, 60.17it/s]

print("{}種類の特徴量が抽出されました。".format(len(features.columns)))

4722種類の特徴量が抽出されました。

プリセットされた特徴量のセットを使う

defaultの設定ではとても多くの特徴量が抽出されるため、特徴量を調整してみます。
tsfreshには以下のように幾つかのパラメータのセットとその設定がカテゴリごとに用意されており、ユーザが簡単に抽出する特徴量を選択することができます。

用意されているパラメータセットの設定

ComprehensiveFCParameters
EfficientFCParameters
IndexBasedFCParameters
MinimalFCParameters
TimeBasedFCParameters

ここでは一例としてMinimalFCParametersを使って特徴量を抽出します。MinimalFCParametersは合計、平均などシンプルな特徴量を集めたもので、各列に対し9個の特徴量をそれぞれ抽出します。
extract_featuresのdefault_fc_parametersに設定を指定することでプリセットされた特徴量を抽出できます。

from tsfresh.feature_extraction import (
    MinimalFCParameters,
    extract_features,
)

settings = MinimalFCParameters()

features = extract_features(
    timeseries,
    column_id="id",
    column_sort="time",
    default_fc_parameters = settings,
).reset_index(drop=True)

Feature Extraction: 100%|██████████| 88/88 [00:00<00:00, 768.93it/s]

print("{}種類の特徴量が抽出されました。".format(len(features.columns)))
print("各変数に対しの以下の特徴量が抽出されました。")
[c for c in features.columns[features.columns.str.startswith('F_x')]]

54種類の特徴量が抽出されました。
各変数に対しの以下の特徴量が抽出されました。

['F_x__sum_values',
 'F_x__median',
 'F_x__mean',
 'F_x__length',
 'F_x__standard_deviation',
 'F_x__variance',
 'F_x__root_mean_square',
 'F_x__maximum',
 'F_x__minimum']

extrace_featuresへの設定は辞書型で定義されており、key：モジュール名、value：モジュールへの入力パラメータとなっています。これらを追加または削除することで、より細かく特徴量抽出をコントロールできます。

ここで使用できる特徴量はtsfresh.feature_extraction.feature_calculatorsに定義されており、内容についてはドキュメントを見ながら、所望の特徴量を追加・削除します。

settings = MinimalFCParameters()
print('settingsは辞書型となっています。')
settings

settingsは辞書型となっています。

{'sum_values': None,
 'median': None,
 'mean': None,
 'length': None,
 'standard_deviation': None,
 'variance': None,
 'root_mean_square': None,
 'maximum': None,
 'minimum': None}

# 不要な特徴量を削除
del settings['sum_values']
del settings['median']
del settings['mean']

# 必要な特徴量を追加（key：モジュール名、value：パラメータ（リストで複数渡すことものできる））
settings["ar_coefficient"] = [
    {"coeff": 0, "k": 4},
    {"coeff": 1, "k": 4},
    {"coeff": 2, "k": 4},
    {"coeff": 3, "k": 4},
]
settings

{'length': None,
 'standard_deviation': None,
 'variance': None,
 'root_mean_square': None,
 'maximum': None,
 'minimum': None,
 'ar_coefficient': [{'coeff': 0, 'k': 4},
  {'coeff': 1, 'k': 4},
  {'coeff': 2, 'k': 4},
  {'coeff': 3, 'k': 4}]}

features = extract_features(
    timeseries,
    column_id="id",
    column_sort="time",
    default_fc_parameters = settings,
).reset_index(drop=True)

Feature Extraction: 100%|██████████| 88/88 [00:00<00:00, 701.41it/s]

print("{}種類の特徴量が抽出されました。".format(len(features.columns)))
print("各変数に対しの以下の特徴量が抽出されました。")
[c for c in features.columns[features.columns.str.startswith('F_x')]]

60種類の特徴量が抽出されました。
各変数に対しの以下の特徴量が抽出されました。

['F_x__length',
 'F_x__standard_deviation',
 'F_x__variance',
 'F_x__root_mean_square',
 'F_x__maximum',
 'F_x__minimum',
 'F_x__ar_coefficient__coeff_0__k_4',
 'F_x__ar_coefficient__coeff_1__k_4',
 'F_x__ar_coefficient__coeff_2__k_4',
 'F_x__ar_coefficient__coeff_3__k_4']

オリジナル特徴量の追加

さらにtsfreshでは、あらかじめ用意された特徴量だけでなく自分で特徴量を実装し追加することができます。tsfreshを使用しなくても個別に特徴量を計算し追加することもできますが、tsfreshの枠組みに乗せることで例えば以下のような利点が得られます。

変数ごとに特徴量を計算してくれる（名前をつけてくれる）
検定で特徴量が削減できる

独自特徴量は主に以下の3ステップで追加できます。

特徴量を計算する関数の作成
tsfresh.feature_extraction.feature_calculatorsに属性を追加
設定をextract_featuresに渡して特徴量を追加

特徴量を計算する関数の作成

tsfreshにオリジナルの特徴量を追加するには、デコレータ（@set_property）をつけた関数を作ります。デコレータのパラメータは、単一の特徴量を返すか、複数の特徴量を返すかで以下のように異なります。

単一特徴量を返す場合：@set_property("fctype", "simple")
複数有徳跳梁を返す場合：@set_property("fctype", "combiner")

ここでは、単一特徴量を返す場合の例として、最大値と最小値の差を特徴量として追加してみようと思いますので、以下のように関数を定義しました。

from tsfresh.feature_extraction.feature_calculators import set_property

@set_property("fctype", "simple")
def amplitude(x):
    # 最大値、最小値を返す
    return max(x) - min(x)

`tsfresh.feature_extraction.feature_calculators`に属性を追加

tsfreshの中で特徴量計算モジュールはfeature_calculatorsの中で定義されているため、以下のように新しく作った関数もfeature_calculatorsに追加する必要があります。

from tsfresh.feature_extraction import feature_calculators

setattr(feature_calculators, amplitude.__name__, amplitude)

設定を`extract_features`に渡して特徴量を追加

default_fc_parametersに定義した関数の名前を追加した辞書型変数を渡し、特徴量を抽出します。

# 定義した関数名の辞書型変数を用意する。入力パラメータがない場合はvalueはNone
settings = {'amplitude': None}

# 特徴量を抽出する
features = extract_features(
    timeseries,
    column_id="id",
    column_sort="time",
    default_fc_parameters = settings,
).reset_index(drop=True)

Feature Extraction: 100%|██████████| 88/88 [00:00<00:00, 823.76it/s]

print("以下の特徴量が抽出されました。")
[c for c in features.columns]

以下の特徴量が抽出されました。
['F_x__amplitude',
 'F_y__amplitude',
 'F_z__amplitude',
 'T_x__amplitude',
 'T_y__amplitude',
 'T_z__amplitude']

まとめ

ここではtsfreshを使った時系列データの特徴量抽出について、抽出する特徴量を追加・削除さらに自分で定義した特徴量を追加する方法を見てきました。tsfreshの枠組みに載せることにより、色々な機能が使えたり、チームで活用できると思います。お試しください！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

	id	time	F_x	F_y	F_z	T_x	T_y
0	1	0	-1	-1	63	-3	-1
1	1	1	0	0	62	-3	-1
2	1	2	-1	-1	61	-3	0
3	1	3	-1	-1	63	-2	-1
4	1	4	-1	-1	63	-3	-1

	id	time	F_x	F_y	F_z	T_x	T_y
0	1	0	-1	-1	63	-3	-1
1	1	1	0	0	62	-3	-1
2	1	2	-1	-1	61	-3	0
3	1	3	-1	-1	63	-2	-1
4	1	4	-1	-1	63	-3	-1