1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

pandas: すべての値がNAである列を`pandas.to_datetime`でdatetimeに変換するとnativeなdatetimeになることを失念していて、TypeErrorが発生した

Posted at

環境

  • Python 3.13.1
  • pandas 2.2.3

やりたいこと

タイムゾーンが日本時間であるISO8601形式の日時から、差分を計算したいです。
値はNAになる可能性があります。

In [10]: df = pandas.DataFrame({"before":["2025-01-01T00:00+09:00", "2025-01-01T06:00+09:00"], 
    "after":["2025-01-01T02:00+09:00",None]}, dtype="string")


In [11]: df
Out[11]:
                   before                   after
0  2025-01-01T00:00+09:00  2025-01-01T02:00+09:00
1  2025-01-01T06:00+09:00                    <NA>


In [12]: df.dtypes
Out[12]:
before    string[python]
after     string[python]
dtype: object


In [19]: dt_after = pandas.to_datetime(df["after"], format="ISO8601")


In [20]: dt_after
Out[20]:
0   2025-01-01 02:00:00+09:00
1                         NaT
Name: after, dtype: datetime64[ns, UTC+09:00]


In [21]: dt_before = pandas.to_datetime(df["before"], format="ISO8601")


In [14]: dt_before
Out[14]:
0   2025-01-01 00:00:00+09:00
1   2025-01-01 06:00:00+09:00
Name: before, dtype: datetime64[ns, UTC+09:00]


In [22]: diff = dt_after - dt_before

In [23]: diff
Out[23]:
0   0 days 02:00:00
1               NaT
dtype: timedelta64[ns]

ハマったこと

after列の値がすべてNAだと、差分を算出する際にTypeErrorが発生しました。

In [3]: df2 = pandas.DataFrame({"before":["2025-01-01T00:00+09:00", "2025-01-01T06:00+09:00"],
    "after":[None,None]}, dtype="string")

In [16]: df2
Out[16]:
                   before after
0  2025-01-01T00:00+09:00  <NA>
1  2025-01-01T06:00+09:00  <NA>

In [17]: df2.dtypes
Out[17]:
before    string[python]
after     string[python]
dtype: object

In [4]: dt_before2 = pandas.to_datetime(df2["before"], format="ISO8601")

In [5]: dt_after2 = pandas.to_datetime(df2["after"], format="ISO8601")

In [6]: dt_before2
Out[6]:
0   2025-01-01 00:00:00+09:00
1   2025-01-01 06:00:00+09:00
Name: before, dtype: datetime64[ns, UTC+09:00]

In [7]: dt_after2
Out[7]:
0   NaT
1   NaT
Name: after, dtype: datetime64[ns]


In [8]: dt_after2 - dt_before2
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File ~/.pyenv/versions/3.13.1/lib/python3.13/site-packages/pandas/core/arrays/datetimelike.py:1165, in DatetimeLikeArrayMixin._sub_datetimelike(self, other)
   1164 try:
-> 1165     self._assert_tzawareness_compat(other)
   1166 except TypeError as err:

File ~/.pyenv/versions/3.13.1/lib/python3.13/site-packages/pandas/core/arrays/datetimes.py:782, in DatetimeArray._assert_tzawareness_compat(self, other)
    781     if other_tz is not None:
--> 782         raise TypeError(
    783             "Cannot compare tz-naive and tz-aware datetime-like objects."
    784         )
    785 elif other_tz is None:

TypeError: Cannot compare tz-naive and tz-aware datetime-like objects.

after列はすべてNAなので、dtypeがtimezone-awareでなくtimezone-nativeでした。timezone-awareとtimezone-nativeは比較できないため、TypeErrorが発生しました。

解決策

タイムゾーンをUTCにする

pandas.to_datetime関数で、utc=Trueを指定しました。

Control timezone-related parsing, localization and conversion.
If True, the function always returns a timezone-aware UTC-localized Timestamp, Series or DatetimeIndex. To do this, timezone-naive inputs are localized as UTC, while timezone-aware inputs are converted to UTC.
If False (default), inputs will not be coerced to UTC. Timezone-naive inputs will remain naive, while timezone-aware ones will keep their time offsets. Limitations exist for mixed offsets (typically, daylight savings), see Examples section for details.

In [18]: dt_after2 = pandas.to_datetime(df2["after"], format="ISO8601", utc=True)

In [19]: dt_after2
Out[19]:
0   NaT
1   NaT
Name: after, dtype: datetime64[ns, UTC]

In [20]: dt_before2 = pandas.to_datetime(df2["before"], format="ISO8601", utc=True)

In [21]: dt_before2
Out[21]:
0   2024-12-31 15:00:00+00:00
1   2024-12-31 21:00:00+00:00
Name: before, dtype: datetime64[ns, UTC]

In [22]: dt_after2 - dt_before2
Out[22]:
0   NaT
1   NaT
dtype: timedelta64[ns]

dt_beforedt_afterのタイムゾーンが日本時間でなくても問題ないのであれば、基本的にutc=Trueを付けおくのがよさそうです。

タイムゾーンがなければ日本時間のタイムゾーンを設定する

In [51]: if dt_after2.dt.tz is None:
    ...:     dt_after2 = dt_after2.dt.tz_localize("+09:00")
    ...:

In [52]: dt_after2
Out[52]:
0   NaT
1   NaT
Name: after, dtype: datetime64[ns, UTC+09:00]
1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?