More than 3 years have passed since last update.

過去気象データ取得【Python×スクレイピング】

Last updated at 2022-03-06Posted at 2022-03-05

1.はじめに

以下の記事がすべてです。
@Cyber_Hacnosukeさんの
GoogleColaboratoryで気象庁の過去気象データをスクレイピングしてみた。
参照させていただき、一部カスタムさせて利用させていただきました！ありがとうございます。

・スクレイピング対象サイト
気象庁過去の気象データ検索

2.概要（取得したいデータ）

■日付：任意期間の過去日別データ
■地域：各都道府県（県庁所在地）
■過去気象データ：以下
・降水量
・気温_平均
・気温_最高
・気温_最低
・湿度_平均
・湿度_最小
・日照時間

■例：このページから取得します。
https://www.data.jma.go.jp/obd/stats/etrn/view/daily_s1.php?prec_no=44&block_no=47662&year=2019&month=01&day=1&view=p1

3.早速コード

import requests
from bs4 import BeautifulSoup #ダウンロードしてなかったらpipでできるからやってね。
import csv

place_codeA = [44] #都道府県コード
place_codeB = [47662] #地域コード
place_name = ["東京"]

# URLで年と月ごとの設定ができるので%sで指定した英数字を埋め込めるようにします。
base_url = "http://www.data.jma.go.jp/obd/stats/etrn/view/daily_s1.php?prec_no=%s&block_no=%s&year=%s&month=%s&day=1&view=p1"

#取ったデータをfloat型に変えるやつ。(データが取れなかったとき気象庁は"/"を埋め込んでいるから0に変える)
def str2float(str):
  try:
    return float(str)
  except:
    return 0.0

if __name__ == "__main__":
  #都市を網羅
  for place in place_name:
    #最終的にデータを集めるリスト
    All_list = [['年月日', '降水量', '気温_平均', '気温_最高', '気温_最低', '湿度_平均', '湿度_最小', '日照時間']]
    print(place)
    index = place_name.index(place)

    # for文で該当期間抽出
    for year in range(2019,2021):
      print(year)
      # その年の1月～12月の12回を網羅する。
      for month in range(1,13):
        #2つの都市コードと年と月を当てはめる。
        r = requests.get(base_url%(place_codeA[index], place_codeB[index], year, month))
        r.encoding = r.apparent_encoding

        # サイトごとスクレイピング
        soup = BeautifulSoup(r.text)
        # findAllで条件に一致するものをすべて抜き出す。
        # 今回の条件はtrタグでclassがmtxになっているもの。
        rows = soup.findAll('tr',class_='mtx')

        # 表の最初の1~4行目はカラム情報なのでスライスする。
        rows = rows[4:]

        # 1日〜最終日までの１行を取得
        for row in rows:
          # trのなかのtdをすべて抜き出す
          data = row.findAll('td')

          #１行の中には様々なデータがあるので全部取り出す。
          rowData = [] #初期化
          rowData.append(str(year) + "/" + str(month) + "/" + str(data[0].string))
          rowData.append(str2float(data[3].string))
          rowData.append(str2float(data[6].string))
          rowData.append(str2float(data[7].string))
          rowData.append(str2float(data[8].string))
          rowData.append(str2float(data[9].string))
          rowData.append(str2float(data[10].string))
          rowData.append(str2float(data[16].string))

          #天気概況を取りたかったが、前半のコードを変更する必要があるためいったん取得しない
          # rowData.append(str2float(data[19].string)) 
          # rowData.append(str2float(data[20].string)) 

          #次の行にデータを追加
          All_list.append(rowData)

    #都市ごとにファイルを生成(csvファイル形式。名前は都市名)
    with open(place + '.csv', 'w',encoding="utf_8_sig") as file: #文字化け防止
      writer = csv.writer(file, lineterminator='\n')
      writer.writerows(All_list)

4.補足

4.1 文字化け防止

参照：PythonでUTF-8 with BOMのCSVファイルを出力する

## 修正前（文字化けする）
with open(place + '.csv', 'w') as file:

## 修正後（文字化け解消）
with open(place + '.csv', 'w',encoding="utf_8_sig") as file:

4.2 全都道府県データを取得したい場合は、以下に変更

place_codeA = [14, 31, 32, 33, 35, 34, 36, 54, 56, 55, 48, 41, 57, 42, 43, 40, 52, 51, 49, 45, 53, 50, 46, 68, 69, 61, 60, 67, 66, 63, 65, 64, 73, 72, 74, 71, 81, 82, 85, 83, 84, 86, 88, 87, 91, 62, 44]
place_codeB = [47412, 47575, 47582, 47584, 47588, 47590, 47595, 47604, 47605, 47607, 47610, 47615, 47616, 47624, 47626, 47629, 47632, 47636, 47638, 47648, 47651, 47656, 47670, 47741, 47746, 47759, 47761, 47765, 47768, 47770, 47777, 47780, 47887, 47891, 47893, 47895, 47762, 47807, 47813, 47815, 47817, 47819, 47827, 47830, 47936, 47772, 47662]
place_name = ["札幌","青森", "秋田", "盛岡", "山形", "仙台", "福島", "新潟", "金沢", "富山", "長野", "宇都宮", "福井", "前橋", "熊谷", "水戸", "岐阜", "名古屋", "甲府", "銚子", "津", "静岡", "横浜", "松江", "鳥取", "京都", "彦根", "広島", "岡山", "神戸", "和歌山", "奈良", "松山", "高松", "高知", "徳島", "下関", "福岡", "佐賀", "大分", "長崎", "熊本", "鹿児島", "宮崎", "大阪", "東京"]

5.余談

PowerBIでWEBから取得を試みておりましたが、都道府県コードマスタが見当たらず（もちろん一つずつ調べればいいだけですが。）、途中で止まってしまいました。今回参照させていただいた記事に乗っておりましたので、活用させていただきました。別途PowerBIでも作成してみます。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up