GoToEatキャンペーンくまもとの加盟店一覧のPDFをCSVに変換
コマンド
wget https://gotoeat-kumamoto.jp/pdf/shoplist.pdf -O data.pdf
apt install python3-tk ghostscript
pip install camelot-py[cv]
camelot -p all -o data.csv -f csv -split lattice -scale 40 data.pdf
Python
import camelot
import pandas as pd
tables = camelot.read_pdf("data.pdf", pages="all", split_text=True, strip_text=" \n", line_scale=40)
dfs = [table.df for table in tables]
df_tmp = pd.concat(dfs)
df = df_tmp.iloc[1:].set_axis(df_tmp.iloc[0].to_list(), axis=1).reset_index(drop=True)
df.sort_values(by=["郵便番号", "町域、番地"], inplace=True)
df.to_csv("kumamoto.csv", encoding="utf_8_sig")