More than 3 years have passed since last update.

名古屋市の新型コロナウイルスワクチン登録医療機関のPDFをCSV変換

Posted at 2021-06-17

前回のPDFは「Microsoft: Print To PDF」で作成のため文字化け

今回は「eDocument Library」で作成されていたので文字化けせずに変換可能でした

ExcelやWordの場合ファイル保存画面からファイルの種類から「PDF」を選択するだけでPDFに変換できます。

プログラム

# ダウンロード
wget "https://www.city.nagoya.jp/kenkofukushi/cmsfiles/contents/0000136/136137/iryoukikan(R30615).pdf" -O data.pdf

apt update

apt install python3-tk ghostscript
pip install camelot-py[cv]

import camelot
import pandas as pd

tables = camelot.read_pdf(
    "data.pdf", pages="all", split_text=True, strip_text=" \n", line_scale=40
)

dfs = [
    pd.DataFrame(table.data[1:], columns=["登録医療機関名", "住所１", "住所２"]) for table in tables
]

df = pd.concat(dfs).reset_index(drop=True)

df

df.to_csv("output.csv", encoding="utf_8_sig")

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up