LoginSignup
0
0

More than 3 years have passed since last update.

鹿児島県の Go To EATの利用可能店舗のPDFをCSVに変換

Posted at

鹿児島商工会議所の利用可能店舗のPDFからCSVに変換

エリアごとにPDFファイルが分かれているのでひとつにまとめる

スクレイピング

import requests
from bs4 import BeautifulSoup

url = "http://www.kagoshima-cci.or.jp/?p=20375"

r = requests.get(url)
r.raise_for_status()

soup = BeautifulSoup(r.content, "html.parser")

result = []

for a in soup.select("#contents_layer > span > p > a"):

    s = a.get_text(strip=True).replace("全域", "").lstrip("〇")

    # 地区は除外
    if not s.endswith("地区"):

        result.append({"area": s, "link": a.get("href")})

データラングリング

import camelot
import pandas as pd

dfs = []

for data in result:

    tables = camelot.read_pdf(
        data["link"], pages="all", flavor="lattice", split_text=True, strip_text=" \n"
    )

    for table in tables:

        df_tmp = table.df.iloc[1:].set_axis(["五十音", "店舗名", "所在地"], axis=1)
        df_tmp["地域"] = data["area"]

        dfs.append(df_tmp)

df = pd.concat(dfs)

df.to_csv("kagoshima.csv", encoding="utf_8_sig")
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0