More than 5 years have passed since last update.

pandasは2262-04-12以降の日付を変換できない

Last updated at 2018-09-10Posted at 2018-09-10

どういうこと？

例えば、中身が下記のようなファイルを

input.txt

pandas.read_table や pandas.read_csv で以下のように読み込んでも日付に変換されません。

import pandas as pd
df = pd.read_table('input.txt', parse_dates=['date'])
print(df)

       date
0  20180910
1  20171228
2  20190702
3  23000501

なぜ？

pandas.Timestamp で定義されている日付の範囲を超えているからです。

実際、先程のコードに続いて pd.to_datetime(df['date']) と実行すると、

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2300-05-01 00:00:00

というエラーが出ます。

エラーメッセージで検索してみると以下QAが該当します。
python - pandas out of bounds nanosecond timestamp after offset rollforward plus adding a month offset - Stack Overflow
どうやら未来の日付は pd.Timestamp.max の値である Timestamp('2262-04-11 23:47:16.854775807') までしか変換できないようです。

試してみると、 pd.to_datetime('2262-04-11') は変換できますが、
pd.to_datetime('2262-04-12') は OutOfBoundsDatetime エラーになります。

どうすればいい？

parse_dates オプションの指定による変換を諦めて、例えば以下のような処理をします。
未来すぎる日付は潔く NaT にしてしまいましょう。

import pandas as pd

# 文字列として読む
df = pd.read_table('input.txt', dtype={'date': str})
# pd.Timestamp.max 以降の日付をNaNとする
df.loc[df['date'] > f'{pd.Timestamp.max:%Y%M%d}', 'date'] = pd.np.nan
# NaNはNaTになる
df['date'] = pd.to_datetime(df['date'])
print(df)

その他の日付形式へは各自読み替えてください。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up