はじめに
今回はニューラルネットワークで行う波浪予測の下準備として,grib2データをpythonから読み込んで,計算に利用する各グリッド点における風速値をまとめたcsvファイルを作成する.
データの読み込みにpygribライブラリを用いるので,先にインストールしておく.こちらを参考にした.
気象庁MSM-GPVのダウンロード
ニューラルネットワークによる波浪予測には海上風のgridデータが必要になるため,気象庁MSMデータをローカルマシンにダウンロードしてから読み込む.また,ファイル形式はgrib2形式のバイナリデータとして気象庁から提供されている.
京都大学生存圏研究所(RISH)が提供する生存圏データベースからデータをダウンロードしていく.
データのダウンロードは以下のスクリプトを用いて行った.
"""
Title : "grib2_shirahama.py"
Author : pokimaza, DPRI
Date : 2023-10-6
Revised : 2023-12-4
Version : 1.7
input :
output : grib2 binary data
"""
# Importing Libraries
import os
import subprocess
import datetime
import time
# subprocess library
# It provides a way of spawn processes, connect to their input/output channels, and obtain their return codes.
# pygrib library
# Make it easy to interact with buoyancy-driven atmospheric data
print('\n')
print('**********************************************************************************')
print('Hello, this script downloads MSM-GPV from RISH-database.')
print('**********************************************************************************')
print('\n')
#*******************************************************************************************************************
# USER SETTINGS ****************************************************************************************************
#*******************************************************************************************************************
# path to download
data_dir = '/home/pokimaza/onedrive-mount/grib2data'
#
###
#####
#######
### ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
### please make hard cooding for your own environment... ~
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# start date and end date
start_date = datetime.date(2023, 4, 1)
end_date = datetime.date(2023, 4, 10)
#*******************************************************************************************************************
# END OF USER SETTINGS *********************************************************************************************
#*******************************************************************************************************************
# change directory
os.chdir(data_dir)
def scraping(date, hour):
cwd = os.getcwd()
dt = date
# convert year to string
yr = str(dt.year)
# convert month to string and store it into variable mh
mh = str(dt.month)
# convert day to string and reduce a day from the current date and store it into variable dy
dy = str(dt.day)
# check if month is less than 10 then add leading zero
if dt.month < 10:
mh = (f"0{mh}")
# check if day is less than 10 then add leading zero
if dt.day < 10:
dy = (f"0{dy}")
# Format hour with leading zero
hour_str = str(hour).zfill(2)
# Specify url of wave data (database rish-Kyoto university)
filename = f'Z__C_RJTD_{yr}{mh}{dy}{hour_str}0000_MSM_GPV_Rjp_Lsurf_FH00-15_grib2.bin'
surf = f'http://database.rish.kyoto-u.ac.jp/arch/jmadata/data/gpv/original/{yr}/{mh}/{dy}/{filename}'
# cwd: pass in the current working directory
# stdout: don't display any output in the terminal
# stderr: no error messages will be displayed in the terminal
subprocess.run(['curl', '-O', surf], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL, cwd=cwd)
# Get the basename from the URL
basename = os.path.basename(surf)
# print('Saved as :\n', os.path.basename(surf))
# Get the name of the file from the URL
file_surf = os.path.join(cwd, os.path.basename(surf))
print('Location to save :\n', file_surf)
date = start_date
# check the run-time
sttime = time.time()
while date != end_date + datetime.timedelta(1):
for hour in range(0, 22, 3):
scraping(date, hour)
print('time :', time.time() - sttime, 's')
date += datetime.timedelta(1)
entime = time.time()
totaltime = entime - sttime
print('\n')
print('**********************************************************************************')
print('\nThis is the end of "grib2_shirahama.py". \nTotal time (s):', totaltime)
print("Congratulations! You've successfully run the script and completed the task!")
print('**********************************************************************************')
print('\n')
以上のコードを実行すれば,grib2のバイナリデータをローカルマシンにダウンロードすることができるはずである.
grib2データをcsvに変換する
gridデータ地点の情報を与えるcsvの例は以下の通りである.作成時は,自分でgridのどの番号を指定したらどの緯度経度になるかを確認しながら作成した.ローカルの適当なpathにテキストエディタで貼り付けなどして読み込めるようにしておく.
208,224,240,256,272
208,224,240,256,272
208,224,240,256,272
208,224,240,256,272
208,224,240,256,272
272,272,272,272,272
292,292,292,292,292
312,312,312,312,312
332,332,332,332,332
352,352,352,352,352
以上のcsvを入れたpathとgrib2バイナリを入れたpathを以下のコードに書いて回すとcsvが完成するはずです.
"""
Title : "makecsvfromMSM.py"
Author : pokimaza, DPRI
Date : 2023-11-27
Revised : 2023-12-1
Version : 2.1
input : MSM-griddata
output : csv
Description : This script makes a csv file for wave prediction constructed by LSTM.
Before you run this script, you have to download MSM-data from RISH-website
(http://database.rish.kyoto-u.ac.jp/arch/jmadata/gpv-original.html).
"""
import os
import datetime
import csv
import numpy as np
import datetime
import time
import pygrib
from datetime import date, timedelta
print('\n')
print('**********************************************************************************')
print('Hello, this script makes a csv file from MSM-grid data (grib2 format).')
print('**********************************************************************************')
print('\n')
print('**********************************************************************************')
print('Loading user settings...')
print('**********************************************************************************')
print('\n')
#*******************************************************************************************************************
#*******************************************************************************************************************
# USER SETTINGS (only change in this section!)**********************************************************************
#*******************************************************************************************************************
#*******************************************************************************************************************
##########################################################################################
# basedir...path to input csv files (feature, lons, lats) #
basedir = '/home/pokimaza/onedrive-mount/lonlatdata' #
# gribdir...path to input grib2 data #
gribdir = '/home/pokimaza/onedrive-mount/grib2data' #
# otptdir...output path #
otptdir = '/home/pokimaza/onedrive-mount/csv_fromMSMgrib' #
##########################################################################################
#
###
#####
#######
###
###
### ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
### please make hard cooding for your own environment... ~
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
os.chdir(basedir)
##########################################################################################
outputname = 'dataframe_201801201805.csv' #
##########################################################################################
##########################################################################################
# start #
dst = date(2023,4,1) #
# end #
den = date(2023,4,10) #
##########################################################################################
##########################################################################################
lon_filename = "lons_test.csv" #
lat_filename = "lats_test.csv" #
##########################################################################################
#*******************************************************************************************************************
#*******************************************************************************************************************
# END OF USER SETTINGS *********************************************************************************************
#*******************************************************************************************************************
#*******************************************************************************************************************
print('**********************************************************************************')
print('Loading lons and lats files...')
print('**********************************************************************************')
# "lons"
try:
with open(lon_filename, "r") as lons_file:
csv_reader = csv.reader(lons_file)
lons = list(csv_reader)
if not lons:
raise ValueError("Oops! It looks like the 'lons' CSV file is empty. Please check and try again.")
lons = [[int(i) for i in row] for row in lons]
print(lons)
except FileNotFoundError:
raise FileNotFoundError(f"Uh-oh! The 'lons' CSV file '{lon_filename}' seems to be missing. Please make sure it exists.")
except Exception as e:
print(f"Oh no! An unexpected error occurred while loading 'lons': {e}")
# "lats"
try:
with open(lat_filename, "r") as lats_file:
csv_reader = csv.reader(lats_file)
lats = list(csv_reader)
if not lats:
raise ValueError("Oops! It looks like the 'lats' CSV file is empty. Please check and try again.")
lats = [[int(i) for i in row] for row in lats]
print(lats)
except FileNotFoundError:
raise FileNotFoundError(f"Uh-oh! The 'lats' CSV file '{lat_filename}' seems to be missing. Please make sure it exists.")
except Exception as e:
print(f"Oh no! An unexpected error occurred while loading 'lats': {e}")
def datetime_file(dt, hr):
# convert year to string
yr = str(dt.year)
# convert month to string and store it into variable mh
mh = str(dt.month)
# convert day to string and reduce a day from the current date and store it into variable dy
dy = str(dt.day)
# check if month is less than 10 then add leading zero
if dt.month < 10:
mh = (f"0{mh}")
# check if day is less than 10(-1) then add leading zero
if dt.day < 10:
dy = (f"0{dy}")
if hr < 10:
hr = (f"0{hr}")
else:
hr = str(hr)
global basename
basename = 'Z__C_RJTD_'+yr+mh+dy+hr+'0000_MSM_GPV_Rjp_Lsurf_FH00-15_grib2.bin'
def date_range(start, stop, step = timedelta(1)):
current = start
while current < stop:
yield current
current += step
def hour_range(start, stop, step = 3):
current = start
while current < stop:
yield current
current += step
os.chdir(gribdir)
st = time.time()
idx_cal = 0
total_iterations = len(list(date_range(dst, den + datetime.timedelta(days=1)))) * len(list(hour_range(0, 21 + 1)))
print('\n')
print('**********************************************************************************')
print('Starting to make csv...')
print('**********************************************************************************')
for date in date_range(dst, den + datetime.timedelta(days=1)):
for hour in hour_range(0, 21 + 1):
datetime_file(date, hour)
# Retry loop in case of grib2 file loading failure
max_retries = 3
for retry in range(max_retries):
try:
grbs = pygrib.open(basename)
U_data = [[] for _ in range(25)]
V_data = [[] for _ in range(25)]
for i in range(5):
for j in range(5):
lon_point = lons
lat_point = lats
uwin_ini = grbs.select(forecastTime=0)[2].values[lat_point[i][j]][lon_point[i][j]]
uwin_1hr = grbs.select(forecastTime=1)[2].values[lat_point[i][j]][lon_point[i][j]]
uwin_2hr = grbs.select(forecastTime=2)[2].values[lat_point[i][j]][lon_point[i][j]]
vwin_ini = grbs.select(forecastTime=0)[3].values[lat_point[i][j]][lon_point[i][j]]
vwin_1hr = grbs.select(forecastTime=1)[3].values[lat_point[i][j]][lon_point[i][j]]
vwin_2hr = grbs.select(forecastTime=2)[3].values[lat_point[i][j]][lon_point[i][j]]
U_data_tmp = [uwin_ini, uwin_1hr, uwin_2hr]
V_data_tmp = [vwin_ini, vwin_1hr, vwin_2hr]
idx = i*5 + j
U_data[idx].extend(U_data_tmp)
V_data[idx].extend(V_data_tmp)
data_array_tmp = np.vstack((U_data, V_data))
if idx_cal == 0:
data_array = data_array_tmp
else:
data_array = np.hstack((data_array, data_array_tmp))
idx_cal = idx_cal + 1
print('Date : ', date)
print('Hour : ', hour)
print('Time : ', time.time() - st, '\n')
progress_percentage = (idx_cal / total_iterations) * 100
progress_bar_length = int(progress_percentage / 2)
progress_bar = '=' * progress_bar_length + '-' * (50 - progress_bar_length)
print(f'\rProgress: [{progress_bar}] {progress_percentage:.2f}%', end='', flush=True)
print('\n')
# Break the retry loop if successful
break
except Exception as e:
print(f'Error: {e}')
if retry < max_retries - 1:
print(f'Retrying in 10 seconds... (Retry {retry + 1}/{max_retries})')
time.sleep(10)
else:
print('Max retries reached. Stopping...')
raise
print('\n')
print('**********************************************************************************')
print('**********************************************************************************')
print('**********************************************************************************')
print('End of making dataarray... ')
print('\n')
print('Saving the csv file... ')
data_array = np.transpose(data_array)
os.chdir(otptdir)
np.savetxt(outputname, data_array, delimiter=',', fmt='%.5f')
print('\nThis is the end of "makecsvfromMSM.py". \nTotal time (s):', time.time() - st)
print("Congratulations! You've successfully run the script and completed the task!")
print('**********************************************************************************')
print('**********************************************************************************')
print('**********************************************************************************')
print('\n')
grib2バイナリを都度開くのは重いので,一度作成しておくと後の作業が楽になると思います.
241210,追記:今回の実装ではすべて対象時刻最近の解析値をつなぎ合わせたデータを利用していますが,コードの軽微な変更でJST9:00 or 18:00発表の予測値のみによるデータセットを作成することも可能です.