環境
- Python 3.13.1
- pandas 2.2.3
やりたいこと
タイムゾーンが日本時間であるISO8601形式の日時から、差分を計算したいです。
値はNAになる可能性があります。
In [10]: df = pandas.DataFrame({"before":["2025-01-01T00:00+09:00", "2025-01-01T06:00+09:00"],
"after":["2025-01-01T02:00+09:00",None]}, dtype="string")
In [11]: df
Out[11]:
before after
0 2025-01-01T00:00+09:00 2025-01-01T02:00+09:00
1 2025-01-01T06:00+09:00 <NA>
In [12]: df.dtypes
Out[12]:
before string[python]
after string[python]
dtype: object
In [19]: dt_after = pandas.to_datetime(df["after"], format="ISO8601")
In [20]: dt_after
Out[20]:
0 2025-01-01 02:00:00+09:00
1 NaT
Name: after, dtype: datetime64[ns, UTC+09:00]
In [21]: dt_before = pandas.to_datetime(df["before"], format="ISO8601")
In [14]: dt_before
Out[14]:
0 2025-01-01 00:00:00+09:00
1 2025-01-01 06:00:00+09:00
Name: before, dtype: datetime64[ns, UTC+09:00]
In [22]: diff = dt_after - dt_before
In [23]: diff
Out[23]:
0 0 days 02:00:00
1 NaT
dtype: timedelta64[ns]
ハマったこと
after
列の値がすべてNAだと、差分を算出する際にTypeError
が発生しました。
In [3]: df2 = pandas.DataFrame({"before":["2025-01-01T00:00+09:00", "2025-01-01T06:00+09:00"],
"after":[None,None]}, dtype="string")
In [16]: df2
Out[16]:
before after
0 2025-01-01T00:00+09:00 <NA>
1 2025-01-01T06:00+09:00 <NA>
In [17]: df2.dtypes
Out[17]:
before string[python]
after string[python]
dtype: object
In [4]: dt_before2 = pandas.to_datetime(df2["before"], format="ISO8601")
In [5]: dt_after2 = pandas.to_datetime(df2["after"], format="ISO8601")
In [6]: dt_before2
Out[6]:
0 2025-01-01 00:00:00+09:00
1 2025-01-01 06:00:00+09:00
Name: before, dtype: datetime64[ns, UTC+09:00]
In [7]: dt_after2
Out[7]:
0 NaT
1 NaT
Name: after, dtype: datetime64[ns]
In [8]: dt_after2 - dt_before2
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
File ~/.pyenv/versions/3.13.1/lib/python3.13/site-packages/pandas/core/arrays/datetimelike.py:1165, in DatetimeLikeArrayMixin._sub_datetimelike(self, other)
1164 try:
-> 1165 self._assert_tzawareness_compat(other)
1166 except TypeError as err:
File ~/.pyenv/versions/3.13.1/lib/python3.13/site-packages/pandas/core/arrays/datetimes.py:782, in DatetimeArray._assert_tzawareness_compat(self, other)
781 if other_tz is not None:
--> 782 raise TypeError(
783 "Cannot compare tz-naive and tz-aware datetime-like objects."
784 )
785 elif other_tz is None:
TypeError: Cannot compare tz-naive and tz-aware datetime-like objects.
after
列はすべてNAなので、dtypeがtimezone-awareでなくtimezone-nativeでした。timezone-awareとtimezone-nativeは比較できないため、TypeError
が発生しました。
解決策
タイムゾーンをUTCにする
pandas.to_datetime
関数で、utc=True
を指定しました。
Control timezone-related parsing, localization and conversion.
If True, the function always returns a timezone-aware UTC-localized Timestamp, Series or DatetimeIndex. To do this, timezone-naive inputs are localized as UTC, while timezone-aware inputs are converted to UTC.
If False (default), inputs will not be coerced to UTC. Timezone-naive inputs will remain naive, while timezone-aware ones will keep their time offsets. Limitations exist for mixed offsets (typically, daylight savings), see Examples section for details.
In [18]: dt_after2 = pandas.to_datetime(df2["after"], format="ISO8601", utc=True)
In [19]: dt_after2
Out[19]:
0 NaT
1 NaT
Name: after, dtype: datetime64[ns, UTC]
In [20]: dt_before2 = pandas.to_datetime(df2["before"], format="ISO8601", utc=True)
In [21]: dt_before2
Out[21]:
0 2024-12-31 15:00:00+00:00
1 2024-12-31 21:00:00+00:00
Name: before, dtype: datetime64[ns, UTC]
In [22]: dt_after2 - dt_before2
Out[22]:
0 NaT
1 NaT
dtype: timedelta64[ns]
dt_before
、dt_after
のタイムゾーンが日本時間でなくても問題ないのであれば、基本的にutc=True
を付けおくのがよさそうです。
タイムゾーンがなければ日本時間のタイムゾーンを設定する
In [51]: if dt_after2.dt.tz is None:
...: dt_after2 = dt_after2.dt.tz_localize("+09:00")
...:
In [52]: dt_after2
Out[52]:
0 NaT
1 NaT
Name: after, dtype: datetime64[ns, UTC+09:00]