LoginSignup
0
2

More than 3 years have passed since last update.

厚生労働省の発生状況のオープンデータをデータクレンジング

Last updated at Posted at 2020-10-05

厚生労働省 国内の発生状況など
オープンデータ

csvに\n改行文字が入っていたり、前日比が同一セルに入っているのでデータクレンジング

import re
import pandas as pd

df = pd.read_csv("https://www.mhlw.go.jp/content/current_situation.csv", index_col=0)

df.index = df.index.str.replace(r"※\d", "").str.replace(",", "").str.replace(r"\\n", "")
df.columns = df.columns.str.replace(r"※\d", "").str.replace(r"\\n", "").str.strip()

df = df.applymap(lambda s: re.sub(r"※\d", "", s))

dfs = []

for name, col in df.iteritems():

    df_tmp = col.str.split(r"\\n", expand=True).rename(columns={0: "累計", 1: "前日比"})
    df_tmp.columns = pd.MultiIndex.from_product([[name], df_tmp.columns])

    dfs.append(df_tmp)

df = pd.concat(dfs, axis=1).fillna(0)

df = df.applymap(lambda s: str(s).replace(",", "").strip().strip("()")).astype(int)

df.to_csv("current_situation.csv", encoding="utf_8_sig")

累計のみ

df.loc[:, (slice(None), "累計")].droplevel(level=1, axis=1)
0
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
2