7
19

More than 5 years have passed since last update.

気象庁の風データをダウンロード

Last updated at Posted at 2017-06-26

気象庁の過去の気象データをダウンロードする.

今回は風.PythonのBeautifulsoupを使ってHTMLをパース.

スクリプト

まずモジュールの準備.

準備
import requests
import bs4
import pandas as pd
import datetime
import time
import numpy as np
from numpy import NaN

次は変換ヘルパー.文字列が数字だったらfloatに変換.///は観測データなしという記号.

ヘルパー
def convert(item_str):
    if not item_str:
        return ''
    if item_str.replace('.','').replace('-','').isdigit():
        return float(item_str)
    if item_str == '///':
        return NaN
    return item_str

それでは本体.urlには県番号と場所番号を入れておく.それを探すには
http://www.data.jma.go.jp/obd/stats/etrn/
から順に場所を選択していけばよい.最終的なURLをみてそれをコピーする.

日付はURL指定,時刻はhh:mm形式でやってくるので,JSTに変換するためにdatetime+timedeltaで変換.

本体
columns = ('JST', 'time', '降水量', '気温', '平均風速', '平均風向', '最大瞬間風速', '最大瞬間風速時風向', '日照時間')
all_df = []

for year in range(2015, 2017): # 2015 ... 2016
    for month in range(1, 13): # 1 ... 12
        for day in range(1, 32): # 1 ... 31

            try:
                this_day = datetime.datetime(year, month, day)
            except ValueError:
                continue # incorrect date; e.g., 2007/2/31 etc.
            print(this_day)

            url = 'http://www.data.jma.go.jp/obd/stats/etrn/view/10min_a1.php? prec_no=44&block_no=47662&year=' + str(year) + '&month=' + str(month) + '&day=' + str(day) + '&view='
            print(url)
            time.sleep(1) # wait for 1 sec
            res = requests.get(url)

            try:
                res.raise_for_status() # check for error
            except Exception as e:
                print('Error: {}'.format(e))
                continue # go to next if error

            res.encoding = 'utf-8'
            soup = bs4.BeautifulSoup(res.text, "lxml")
            tbl = soup.select("#tablefix1 td") # find the table
            n_rows = len(tbl) // 8


            for r in range(n_rows):
                i = 8 * r

                # JST
                hh, mm = tbl[i + 0].getText().split(":") # '00:10' --> '00', '10'
                row_timedelta = datetime.timedelta(hours=int(hh), minutes=int(mm))

                row_time = this_day + row_timedelta # for converting "24:00" to "00:00" of the next day
                row = [row_time]

                # other data
                row.extend([convert(tbl[i + j].getText()) for j in range(8)])


                row_df = pd.DataFrame(columns=columns)
                row_df.loc[0] = row
                all_df.append(row_df)


df = pd.concat(all_df, ignore_index=True)        
df.to_excel('wind_data.xlsx')
7
19
2

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
7
19