0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

[Python]データフレームに誕生日を利用して年齢を追加する

Last updated at Posted at 2021-11-19

意外とてこずったので備忘録として残します。

まずは適当にデータフレームを作成
元のデータがなぜか日付が表記揺れしてたのでそのまま再現しています。

import pandas as pd

dict1=dict(entry=["20211120","20211020","20201120"], birthday=["1998/10/24","1966/8/8","1957/6/2"])
list1=pd.DataFrame(data=dict1)
list1

年齢を算出

list1['entry']=pd.to_datetime(list1['entry'])
list1['birthday']=pd.to_datetime(list1['birthday'])

list1["age"]=(pd.to_datetime(list1['entry'])-pd.to_datetime(list1['birthday'])).astype('timedelta64[Y]')
list1

astype('timedelta64[Y]') こいつがミソです。

日付から「年」「月」「曜日」「曜日」を抽出して列に追加

list1['year']=list1['entry'].dt.year
list1['month']=list1['entry'].dt.month
list1['day']=list1['entry'].dt.day
list1['weekday']=list1['entry'].dt.weekday
#逆に「年」「月」「日」を日付に戻す
list1["date"] = pd.to_datetime(list1[["year", "month", "day"]])

#日付データに誤表記が含まれていた場合#

例えば1000/11/20が含まれていると

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1000-12-04 00:00:00

とエラーが出る。
その場合はerrors = 'coerce'とすることでおかしな日付はNaTとなる

list1['entry']=pd.to_datetime(list1['entry'], errors = 'coerce')

おかしい日付を見つけて、該当の日付を修正するか削除する

list1[list1['entry'].isna()]

#別の方法で年齢を算出

list1['entry']=pd.to_datetime(list1['entry'])
list1['birthday']=pd.to_datetime(list1['birthday'])

age=[]

for i,u in zip(list1['entry'],list1['birthday']):
  ages=(int(i.strftime("%Y%m%d")) - int(u.strftime("%Y%m%d"))) // 10000
  age.append(ages)

list1['age']=age
0
0
2

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?