0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

熊本県の Go To EATの加盟店一覧のPDFをCSVに変換

Last updated at Posted at 2020-10-15

GoToEatキャンペーンくまもとの加盟店一覧のPDFをCSVに変換

コマンド

wget https://gotoeat-kumamoto.jp/pdf/shoplist.pdf -O data.pdf

apt install python3-tk ghostscript
pip install camelot-py[cv]

camelot -p all -o data.csv -f csv -split lattice -scale 40 data.pdf

Python

import camelot
import pandas as pd

tables = camelot.read_pdf("data.pdf", pages="all", split_text=True, strip_text=" \n", line_scale=40)

dfs = [table.df for table in tables]

df_tmp = pd.concat(dfs)

df = df_tmp.iloc[1:].set_axis(df_tmp.iloc[0].to_list(), axis=1).reset_index(drop=True)
df.sort_values(by=["郵便番号", "町域、番地"], inplace=True)

df.to_csv("kumamoto.csv", encoding="utf_8_sig")
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?