Qiita全国学生対抗戦 Advent Calendar 2023

grib2フォーマットの気象庁MSM-GPVをcsvファイルに変換する

Last updated at 2024-12-10Posted at 2023-12-04

はじめに

今回はニューラルネットワークで行う波浪予測の下準備として，grib2データをpythonから読み込んで，計算に利用する各グリッド点における風速値をまとめたcsvファイルを作成する．
データの読み込みにpygribライブラリを用いるので，先にインストールしておく．こちらを参考にした．

気象庁MSM-GPVのダウンロード

ニューラルネットワークによる波浪予測には海上風のgridデータが必要になるため，気象庁MSMデータをローカルマシンにダウンロードしてから読み込む．また，ファイル形式はgrib2形式のバイナリデータとして気象庁から提供されている．

京都大学生存圏研究所（RISH）が提供する生存圏データベースからデータをダウンロードしていく．
データのダウンロードは以下のスクリプトを用いて行った．

grib2_shirahama.py

"""
Title       : "grib2_shirahama.py"
Author      : pokimaza, DPRI
Date        : 2023-10-6
Revised     : 2023-12-4
Version     : 1.7
input       : 
output      : grib2 binary data
"""

# Importing Libraries
import os
import subprocess
import datetime
import time

# subprocess library
# It provides a way of spawn processes, connect to their input/output channels, and obtain their return codes.

# pygrib library
# Make it easy to interact with buoyancy-driven atmospheric data

print('\n')
print('**********************************************************************************')
print('Hello, this script downloads MSM-GPV from RISH-database.')
print('**********************************************************************************')
print('\n')

#*******************************************************************************************************************
# USER SETTINGS ****************************************************************************************************
#*******************************************************************************************************************

# path to download
data_dir = '/home/pokimaza/onedrive-mount/grib2data'    

    #
   ###
  #####
 #######
   ###  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   ###  please make hard cooding for your own environment...  ~
    #   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# start date and end date
start_date = datetime.date(2023, 4, 1)
end_date = datetime.date(2023, 4, 10)

#*******************************************************************************************************************
# END OF USER SETTINGS *********************************************************************************************
#*******************************************************************************************************************

# change directory
os.chdir(data_dir)

def scraping(date, hour):
    cwd = os.getcwd()
    dt = date

    # convert year to string
    yr = str(dt.year)

    # convert month to string and store it into variable mh
    mh = str(dt.month)

    # convert day to string and reduce a day from the current date and store it into variable dy
    dy = str(dt.day) 

    # check if month is less than 10 then add leading zero
    if dt.month < 10:
        mh = (f"0{mh}")

    # check if day is less than 10 then add leading zero 
    if dt.day < 10:
        dy = (f"0{dy}")

    # Format hour with leading zero
    hour_str = str(hour).zfill(2)

    # Specify url of wave data (database rish-Kyoto university)
    filename = f'Z__C_RJTD_{yr}{mh}{dy}{hour_str}0000_MSM_GPV_Rjp_Lsurf_FH00-15_grib2.bin'
    surf = f'http://database.rish.kyoto-u.ac.jp/arch/jmadata/data/gpv/original/{yr}/{mh}/{dy}/{filename}'

    # cwd: pass in the current working directory
    # stdout: don't display any output in the terminal
    # stderr: no error messages will be displayed in the terminal
    subprocess.run(['curl', '-O', surf], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL, cwd=cwd)

    # Get the basename from the URL
    basename = os.path.basename(surf) 
    # print('Saved as :\n', os.path.basename(surf))

    # Get the name of the file from the URL
    file_surf = os.path.join(cwd, os.path.basename(surf)) 
    print('Location to save :\n', file_surf)

date = start_date

# check the run-time
sttime = time.time()

while date != end_date + datetime.timedelta(1):
    for hour in range(0, 22, 3):
        scraping(date, hour)
        print('time :', time.time() - sttime, 's')
    date += datetime.timedelta(1)

entime = time.time()
totaltime = entime - sttime

print('\n')
print('**********************************************************************************')

print('\nThis is the end of "grib2_shirahama.py". \nTotal time (s):', totaltime)
print("Congratulations! You've successfully run the script and completed the task!")

print('**********************************************************************************')
print('\n')

以上のコードを実行すれば，grib2のバイナリデータをローカルマシンにダウンロードすることができるはずである．

grib2データをcsvに変換する

gridデータ地点の情報を与えるcsvの例は以下の通りである．作成時は，自分でgridのどの番号を指定したらどの緯度経度になるかを確認しながら作成した．ローカルの適当なpathにテキストエディタで貼り付けなどして読み込めるようにしておく．

lons_test.csv

208,224,240,256,272
208,224,240,256,272
208,224,240,256,272
208,224,240,256,272
208,224,240,256,272

lats_test.csv

272,272,272,272,272
292,292,292,292,292
312,312,312,312,312
332,332,332,332,332
352,352,352,352,352

以上のcsvを入れたpathとgrib2バイナリを入れたpathを以下のコードに書いて回すとcsvが完成するはずです．

makecsvfromMSM.py

"""
Title       : "makecsvfromMSM.py"
Author      : pokimaza, DPRI
Date        : 2023-11-27
Revised     : 2023-12-1
Version     : 2.1
input       : MSM-griddata
output      : csv
Description : This script makes a csv file for wave prediction constructed by LSTM. 
              Before you run this script, you have to download MSM-data from RISH-website
              (http://database.rish.kyoto-u.ac.jp/arch/jmadata/gpv-original.html).
"""

import os
import datetime
import csv
import numpy as np
import datetime
import time
import pygrib
from datetime import date, timedelta

print('\n')
print('**********************************************************************************')
print('Hello, this script makes a csv file from MSM-grid data (grib2 format).')
print('**********************************************************************************')

print('\n')
print('**********************************************************************************')
print('Loading user settings...')
print('**********************************************************************************')
print('\n')

#*******************************************************************************************************************
#*******************************************************************************************************************
# USER SETTINGS (only change in this section!)**********************************************************************
#*******************************************************************************************************************
#*******************************************************************************************************************

##########################################################################################
# basedir...path to input csv files (feature, lons, lats)                                #
basedir = '/home/pokimaza/onedrive-mount/lonlatdata'                                     #
# gribdir...path to input grib2 data                                                     #
gribdir = '/home/pokimaza/onedrive-mount/grib2data'                                      #
# otptdir...output path                                                                  #
otptdir = '/home/pokimaza/onedrive-mount/csv_fromMSMgrib'                                #
##########################################################################################

    #
   ###
  #####
 #######
   ###
   ###
   ###  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   ###  please make hard cooding for your own environment...  ~
    #   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

os.chdir(basedir)

##########################################################################################
outputname = 'dataframe_201801201805.csv'                                                #
##########################################################################################


##########################################################################################
# start                                                                                  #
dst = date(2023,4,1)                                                                     #
# end                                                                                    #
den = date(2023,4,10)                                                                     #
##########################################################################################


##########################################################################################
lon_filename = "lons_test.csv"                                                           #
lat_filename = "lats_test.csv"                                                           #
##########################################################################################

#*******************************************************************************************************************
#*******************************************************************************************************************
# END OF USER SETTINGS *********************************************************************************************
#*******************************************************************************************************************
#*******************************************************************************************************************

print('**********************************************************************************')
print('Loading lons and lats files...')
print('**********************************************************************************')
# "lons"
try:
    with open(lon_filename, "r") as lons_file:
        csv_reader = csv.reader(lons_file)
        lons = list(csv_reader)
        if not lons:
            raise ValueError("Oops! It looks like the 'lons' CSV file is empty. Please check and try again.")
        lons = [[int(i) for i in row] for row in lons]
        print(lons)
except FileNotFoundError:
    raise FileNotFoundError(f"Uh-oh! The 'lons' CSV file '{lon_filename}' seems to be missing. Please make sure it exists.")
except Exception as e:
    print(f"Oh no! An unexpected error occurred while loading 'lons': {e}")

# "lats"
try:
    with open(lat_filename, "r") as lats_file:
        csv_reader = csv.reader(lats_file)
        lats = list(csv_reader)
        if not lats:
            raise ValueError("Oops! It looks like the 'lats' CSV file is empty. Please check and try again.")
        lats = [[int(i) for i in row] for row in lats]
        print(lats)
except FileNotFoundError:
    raise FileNotFoundError(f"Uh-oh! The 'lats' CSV file '{lat_filename}' seems to be missing. Please make sure it exists.")
except Exception as e:
    print(f"Oh no! An unexpected error occurred while loading 'lats': {e}")


def datetime_file(dt, hr):

    # convert year to string
    yr = str(dt.year)

    # convert month to string and store it into variable mh
    mh = str(dt.month)

    # convert day to string and reduce a day from the current date and store it into variable dy
    dy = str(dt.day) 

    # check if month is less than 10 then add leading zero
    if dt.month < 10:
        mh = (f"0{mh}")

    # check if day is less than 10(-1) then add leading zero 
    if dt.day < 10:
        dy = (f"0{dy}")
    
    if hr < 10:
        hr = (f"0{hr}")
    else:
        hr = str(hr)

    global basename
    basename = 'Z__C_RJTD_'+yr+mh+dy+hr+'0000_MSM_GPV_Rjp_Lsurf_FH00-15_grib2.bin'


def date_range(start, stop, step = timedelta(1)):
    current = start
    while current < stop:
        yield current
        current += step

def hour_range(start, stop, step = 3):
    current = start
    while current < stop:
        yield current
        current += step


os.chdir(gribdir)

st = time.time()
idx_cal = 0
total_iterations = len(list(date_range(dst, den + datetime.timedelta(days=1)))) * len(list(hour_range(0, 21 + 1)))
print('\n')
print('**********************************************************************************')
print('Starting to make csv...')
print('**********************************************************************************')

for date in date_range(dst, den + datetime.timedelta(days=1)):
    for hour in hour_range(0, 21 + 1):
        datetime_file(date, hour)

        # Retry loop in case of grib2 file loading failure
        max_retries = 3
        for retry in range(max_retries):
            try:
                grbs = pygrib.open(basename)

                U_data = [[] for _ in range(25)]
                V_data = [[] for _ in range(25)]

                for i in range(5):
                    for j in range(5):
                        lon_point = lons
                        lat_point = lats

                        uwin_ini = grbs.select(forecastTime=0)[2].values[lat_point[i][j]][lon_point[i][j]]
                        uwin_1hr = grbs.select(forecastTime=1)[2].values[lat_point[i][j]][lon_point[i][j]]
                        uwin_2hr = grbs.select(forecastTime=2)[2].values[lat_point[i][j]][lon_point[i][j]]
                        vwin_ini = grbs.select(forecastTime=0)[3].values[lat_point[i][j]][lon_point[i][j]]
                        vwin_1hr = grbs.select(forecastTime=1)[3].values[lat_point[i][j]][lon_point[i][j]]
                        vwin_2hr = grbs.select(forecastTime=2)[3].values[lat_point[i][j]][lon_point[i][j]]

                        U_data_tmp = [uwin_ini, uwin_1hr, uwin_2hr]
                        V_data_tmp = [vwin_ini, vwin_1hr, vwin_2hr]

                        idx = i*5 + j
                        U_data[idx].extend(U_data_tmp)
                        V_data[idx].extend(V_data_tmp)

                data_array_tmp = np.vstack((U_data, V_data))

                if idx_cal == 0:
                    data_array = data_array_tmp
                else:
                    data_array = np.hstack((data_array, data_array_tmp))

                idx_cal = idx_cal + 1

                print('Date : ', date)
                print('Hour : ', hour)
                print('Time : ', time.time() - st, '\n')

                progress_percentage = (idx_cal / total_iterations) * 100
                progress_bar_length = int(progress_percentage / 2)
                progress_bar = '=' * progress_bar_length + '-' * (50 - progress_bar_length)
                print(f'\rProgress: [{progress_bar}] {progress_percentage:.2f}%', end='', flush=True)
                print('\n')

                # Break the retry loop if successful
                break

            except Exception as e:
                print(f'Error: {e}')
                if retry < max_retries - 1:
                    print(f'Retrying in 10 seconds... (Retry {retry + 1}/{max_retries})')
                    time.sleep(10)
                else:
                    print('Max retries reached. Stopping...')
                    raise
print('\n')
print('**********************************************************************************')
print('**********************************************************************************')
print('**********************************************************************************')
print('End of making dataarray... ')
print('\n')
print('Saving the csv file... ')
data_array = np.transpose(data_array)
os.chdir(otptdir)
np.savetxt(outputname, data_array, delimiter=',', fmt='%.5f')
print('\nThis is the end of "makecsvfromMSM.py". \nTotal time (s):', time.time() - st)
print("Congratulations! You've successfully run the script and completed the task!")
print('**********************************************************************************')
print('**********************************************************************************')
print('**********************************************************************************')
print('\n')

grib2バイナリを都度開くのは重いので，一度作成しておくと後の作業が楽になると思います．

241210，追記：今回の実装ではすべて対象時刻最近の解析値をつなぎ合わせたデータを利用していますが，コードの軽微な変更でJST9:00 or 18:00発表の予測値のみによるデータセットを作成することも可能です．

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up