More than 3 years have passed since last update.

厚生労働省の発生状況のオープンデータをデータクレンジング

Last updated at 2020-10-05Posted at 2020-10-05

csvに\n改行文字が入っていたり、前日比が同一セルに入っているのでデータクレンジング

import re
import pandas as pd

df = pd.read_csv("https://www.mhlw.go.jp/content/current_situation.csv", index_col=0)

df.index = df.index.str.replace(r"※\d", "").str.replace(",", "").str.replace(r"\\n", "")
df.columns = df.columns.str.replace(r"※\d", "").str.replace(r"\\n", "").str.strip()

df = df.applymap(lambda s: re.sub(r"※\d", "", s))

dfs = []

for name, col in df.iteritems():

    df_tmp = col.str.split(r"\\n", expand=True).rename(columns={0: "累計", 1: "前日比"})
    df_tmp.columns = pd.MultiIndex.from_product([[name], df_tmp.columns])

    dfs.append(df_tmp)

df = pd.concat(dfs, axis=1).fillna(0)

df = df.applymap(lambda s: str(s).replace(",", "").strip().strip("()")).astype(int)

df.to_csv("current_situation.csv", encoding="utf_8_sig")

累計のみ

df.loc[:, (slice(None), "累計")].droplevel(level=1, axis=1)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up