1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

予防接種実施計画の作成等の状況のPDFをCSVに変換

Last updated at Posted at 2021-03-18

変換後のxlsxファイル

修正箇所

※ 市町村が長い
426 ひたちなか市
435 かすみがうら市
440 つくばみらい市
813 南アルプス市
831 富士河口湖町
1369 山陽小野田市
1673 いちき串木野市

※ 計画のステータスが「接種会場など」->「接種会場等、」
795 鯖江市

※公表日が長い
640 目黒区 

※シミュレーション実施日が長い
1219 田原本町

wget https://www.mhlw.go.jp/content/000754414.pdf -O data.pdf
pip install pdfplumber
import pdfplumber
import pandas as pd

table_settings = {
    # 垂直基準
    "vertical_strategy": "lines_strict",
    # 水平基準
    "horizontal_strategy": "lines_strict",
}

with pdfplumber.open("data.pdf") as pdf:

    dfs = []

    for page in pdf.pages:

        table = page.extract_table(table_settings)

        df_tmp = pd.DataFrame(table)

        dfs.append(df_tmp)

df = pd.concat(dfs)

df1 = df.copy().fillna("").astype(str)

df2 = df1.applymap(lambda s: "".join(s.split()))

df2.to_csv("data.csv", index=False, header=False, encoding="utf_8_sig")
1
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?