LoginSignup
0
1

More than 1 year has passed since last update.

避難施設の一覧表のPDFをCSVに変換

Posted at

説明

上記から都道府県のファイルをダウンロードし「data.pdf」で保存

位置情報はYahoo!地図のURIより抽出

プログラム

import urllib.parse

import pandas as pd
import pdfplumber


def uri2latlon(uri):
    qs = urllib.parse.urlparse(uri).query
    qs_d = urllib.parse.parse_qs(qs)
    return pd.Series({"lat": qs_d["lat"][0], "lon": qs_d["lon"][0]})

# 列名
col = [
    "名称",
    "市町村名",
    "住所",
    "緊急一時避難施設",
    "コンクリート造",
    "24時間避難可能",
    "地下施設",
    "地理院地図",
    "googleマップ",
    "Yahoo!地図",
]

with pdfplumber.open("data.pdf") as pdf:

    dfs = []

    for page in pdf.pages:

        df_uri = (
            pd.DataFrame(page.hyperlinks)
            .query('uri.str.startswith("https://map.yahoo.co.jp/")', engine="python")
            .sort_values("top")
        )
        df_uri.reset_index(drop=True, inplace=True)

        df_latlon = df_uri["uri"].apply(uri2latlon)

        table = page.extract_table()

        df_tmp = pd.DataFrame(table[1:], columns=col).join(df_latlon)

        dfs.append(df_tmp)

df = pd.concat(dfs).reset_index(drop=True)

df.index += 1

df["名称"] = df["名称"].str.replace("\s", "", regex=True)
df["住所"] = df["住所"].str.replace("\s", "", regex=True)

df.to_csv("result.csv")
0
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1