時系列データでフラグが立っている範囲を可視化する

Last updated at 2019-08-01Posted at 2019-07-29

時系列データでこういうグラフをプロットしたいときのやり方メモ。

2019年7月の東京の気候データ

2019年7月の東京は雨の日が多かったらしい。
気温も例年より低くてが全然売れなかったとかなんとか。

今回は 2019-07-01 から 2019-07-28 までの東京の気候データを扱うことにする。

import pandas as pd

df = pd.DataFrame(
    {
        '最高気温': [24.3, 28.1, 29.1, 25.2, 24.5, 23.7, 20.8, 24.8, 21.8, 24.8, 23.6, 21.9, 27.3, 22.5, 25.0, 22.1, 28.7, 29.7, 31.4, 29.5, 28.2, 24.0, 29.3, 31.6, 32.4, 33.1, 31.4, 32.3],
        '最低気温': [21.0, 21.5, 22.2, 23.4, 19.0, 19.2, 18.7, 17.7, 18.5, 18.5, 18.7, 17.9, 20.0, 20.2, 19.8, 19.0, 20.3, 22.2, 23.0, 24.8, 24.0, 21.6, 22.4, 23.7, 24.4, 25.6, 25.1, 25.0],
        '雨が降った': [bool(v) for v in [1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0]],
    },
    index=pd.date_range(start='2019-07-01', end='2019-07-28')
)

	最高気温	最低気温	雨が降った
2019-07-01	24.3	21.0	True
2019-07-02	28.1	21.5	False
2019-07-03	29.1	22.2	False
2019-07-04	25.2	23.4	True
2019-07-05	24.5	19.0	False

「最高気温」「最低気温」に加えて、「雨が降った」かどうかを表す bool 型のカラムがある。

とりあえず「最高気温」「最低気温」だけプロットしてみるとこんな感じ。

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(12, 4))
ax.plot(df['最低気温'], label='最低気温')
ax.plot(df['最高気温'], label='最高気温')
ax.legend()
ax.set_title('2019年7月の東京の気候')
plt.show()

フラグが立っている範囲を塗りつぶす

「雨が降った」列データを使って、上記のグラフに加えて「この日からこの日までは雨だった」という範囲を塗りつぶしたい。

面倒なので結論だけ書いてしまうと、以下のような関数を作っておくと便利。

import pandas as pd

def fill_flag_area(ax, flags, label=None, freq=None, **kwargs):
    """ フラグが立っている領域を塗りつぶす
    params:
        ax: Matplotlib の Axes オブジェクト
        flags: index が DatetimeIndex で dtype が bool な pandas.Series オブジェクト
        freq: 時系列データの1単位時間, 指定しない場合は flags.index.freq が使われる
              flags.index.freq が None の場合には必ず指定しなければならない
              (例: 1日単位のデータの場合) pandas.tseries.frequencies.Day(1)
    return:
        Matplotlib の Axes オブジェクト
    """
    assert flags.dtype == bool
    assert type(flags.index) == pd.DatetimeIndex
    freq = freq or flags.index.freq
    assert freq is not None
    diff = pd.Series([0] + list(flags.astype(int)) + [0]).diff().dropna()
    for start, end in zip(flags.index[diff.iloc[:-1] == 1], flags.index[diff.iloc[1:] == -1]):
        ax.axvspan(start, end + freq, label=label, **kwargs)
        label = None  # 凡例が複数表示されないようにする
    return ax

基本的には diff() を使って境界のインデックスを見つけていくのだけど、0 のパディングを追加することで隅までフラグが立っているケースを取りこぼさないようにしたり、axvspan で範囲を塗りつぶす際に end に1単位時間 (flags.index.freq) を加える必要があったりと、地味にハマりポイントを回避している。

使うときはこう。

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(12, 4))
ax.plot(df['最低気温'], label='最低気温')
ax.plot(df['最高気温'], label='最高気温')
fill_flag_area(ax, df['雨が降った'], label='雨の日', alpha=0.2)  # 追加
ax.legend()
ax.set_title('2019年7月の東京の気候')
plt.show()

めでたしめでたし。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up