More than 1 year has passed since last update.

[python / pandas] 年・月・日・時・分が別々の列に別れている DataFrame から、日付型の列を作成する

Posted at 2024-05-19

1. 概要

1-1. 動機

結構調べたけど、実現してくれる関数っぽいものは見つけられなかったので、記事にしておく((φ(>ω<*)

1-2. やりたいこと

以下のような DataFrame があったときに、

colaboratory

df_datetime: pd.DataFrame = pd.DataFrame({
    "Year": [2024] * 10,
    "Month": [1, 1] + [2] * 4 + [4] * 4,
    "Day": [10, 13, 1, 2, 14, 15, 19, 20, 21, 24],
    "Hour": [11, 15, 18, 21, 1, 6, 21, 23, 2, 5],
    "Minute": [20, 21, 3, 35, 15, 59, 8, 2, 52, 26]
})

df_datetime

# 実行結果
	Year	Month	Day	Hour	Minute
0	2024	1       10	11      20
1	2024	1       13	15      21
2	2024	2       1	18      3
3	2024	2       2	21      35
4	2024	2       14	1       15
5	2024	2       15	6       59
6	2024	4       19	21      8
7	2024	4       20	23      2
8	2024	4       21	2       52
9	2024	4       24	5       26

以下のように新しい列を作りたい。

colaboratory

	Year	Month	Day	Hour	Minute   dt
0	2024	1       10	11      20       2024-01-10 11:20:00
1	2024	1       13	15      21       2024-01-13 15:21:00
2	2024	2       1	18      3        2024-02-01 18:03:00
3	2024	2       2	21      35       2024-02-02 21:35:00
4	2024	2       14	1       15       2024-02-14 01:15:00
5	2024	2       15	6       59       2024-02-15 06:59:00
6	2024	4       19	21      8        2024-04-19 21:08:00
7	2024	4       20	23      2        2024-04-20 23:02:00
8	2024	4       21	2       52       2024-04-21 02:52:00
9	2024	4       24	5       26       2024-04-24 05:26:00

2. 実現方法

以下のように書けば、所望の列を生成できる！

colaboratory

# 各列の値が int や float の時は、 str にしておく
df_tmp = df_datetime.astype({
    "Year": str,
    "Month": str,
    "Day": str,
    "Hour": str,
    "Minute": str
})

# 全ての列が str (object) 型なら実行可能
series_cat = (
    df_tmp["Year"]
    .str.cat(df_tmp["Month"], sep="/")
    .str.cat(df_tmp["Day"], sep="/")
    .str.cat(df_tmp["Hour"], sep=" ")
    .str.cat(df_tmp["Minute"], sep=":")
)
pd.to_datetime(series_cat)

# 実行結果
0   2024-01-10 11:20:00
1   2024-01-13 15:21:00
2   2024-02-01 18:03:00
3   2024-02-02 21:35:00
4   2024-02-14 01:15:00
5   2024-02-15 06:59:00
6   2024-04-19 21:08:00
7   2024-04-20 23:02:00
8   2024-04-21 02:52:00
9   2024-04-24 05:26:00
Name: Year, dtype: datetime64[ns]

3. 補足

"2024/2/4 1:2:3" こういう文字列を正しく datetime 型に変換できるのか不安があったので、試してみた。

pd.to_datetime("2024/2/4 1:2:3")

# 実行結果
Timestamp('2024-02-04 01:02:03')

大丈夫みたいだね( *˙︶˙*)وｸﾞｯ!

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up