More than 1 year has passed since last update.

googlemapを利用して住所から郵便番号を自動検索する

Last updated at 2023-03-27Posted at 2023-03-27

背景

住所→郵便番号を知りたいとき、ken_all csvと正規表現で頑張るのが王道？だと思うのですが、まじめにやるといろいろ苦労が多い印象。
どこかの誰かが提供しているAPIはいつ終了するかわからない＆有料なことが多い
かなりの妥協案ですが、googlemapの自動検索ならそこそこの時間＆精度で郵便番号が出せるんじゃないかと思いました。

環境

BeautifulSoup,seleniumインストール

$ pip install beautifulsoup4
$ pip install selenium

chromedriverのインストール(プロジェクトフォルダの下にdriverフォルダを作成してそこにchromedriver.exeを置く)

プログラム

yubin.py

# This Python file uses the following encoding: utf-8
import requests
from bs4 import BeautifulSoup as bs4
from requests.exceptions import Timeout
import re
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import NoSuchElementException
import sys
import os
import time

#相対パスを絶対パスに変換
def resource_path(relative_path):
    try:
        base_path = sys._MEIPASS
    except Exception:
        base_path = os.path.dirname(__file__)
    return os.path.join(base_path, relative_path)

#chromedriverのパス
bdriverpath = resource_path('./driver/chromedriver.exe')

#住所から郵便番号を返す
def searchyubin(address):
    #住所検索URL
    url = "https://www.google.com/maps/place/"
    url += address

    options = Options()
    options.add_argument('--headless')     #chromeをヘッドレス起動
    driver=webdriver.Chrome(options=options, executable_path=bdriverpath)

    driver.get(url)
    time.sleep(1)
    resouce = driver.page_source
    driver.quit()
    soup = bs4(resouce, "html.parser")
    if soup is not None:
        #取得したページから郵便番号（数字3つ + "-" + 数字4つ)の部分を抜き出す
        obj = re.search(r"(\d{3})-(\d{4})", soup.text)
        if obj is not None:
            yubin = obj.group()
        else:
            yubin = "該当なし"

    return yubin

試してみた感想

ほとんどの場合正しい郵便番号が取れるが、相当な田舎は対応してないことが多い
一件一件web検索しているので当然時間がかかる(十分な検索間隔もとらないといけない)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up