Qiita Engineer Festa20242024年7月17日まで開催中！

Pythonによる気象・気候データ解析I 3章章末問題

Last updated at 2024-07-14Posted at 2024-06-16

Pythonによる気象・気候データ解析I: Pythonの基礎・気候値と偏差・回帰相関分析

の章末問題を一つずつ解いていきます。

A

Google Colaboratolyを用いています。

import numpy as np
import matplotlib.pyplot as plt

tokyo_temp = np.genfromtxt('drive/MyDrive/Tokyo_temp.csv', delimiter=',', usecols=(0, 1, 2))
y = tokyo_temp[:, 0]
m = tokyo_temp[:, 1]
temp = tokyo_temp[:, 2]
tokyo_temp_1890_1919 = temp[(1890 <= y) * (y <=1919)]
tokyo_temp_1990_2019 = temp[(1990 <= y) * (y <=2019)]

plt.figure(figsize=(12, 4))
plt.plot(tokyo_temp_1890_1919, label = '1890_1919')
plt.plot(tokyo_temp_1990_2019, label = '1990_2019')
plt.legend(bbox_to_anchor=(1, 1))
plt.ylabel('Temperature')
plt.show()

考察
- 1890_1919よりも1990_2019の方が全体に2-3℃気温が上がっている。いわゆる「地球温暖化」と言われている現象が考えられる（地球スケールの長期で見るとまた違うよ、という意見まではフォローできる結果ではないが）。

B

import numpy as np
import matplotlib.pyplot as plt
import japanize_matplotlib
from matplotlib.colors import Normalize

loadfile = 'drive/MyDrive/sst_OISST.npz'
sst_dataset = np.load(loadfile)
sst = sst_dataset['sst']
lon2 = sst_dataset['lon2']
lat2 = sst_dataset['lat2']
y = sst_dataset['y']
m = sst_dataset['m']

[imt, jmt, tmt] = sst.shape

sst_clim = np.zeros((imt, jmt, 12))

for mm in range(1, 13):
  sst_clim[:, :, mm - 1] = np.mean(sst[:, :, m == 12], 2)

sst_1982_1999 = sst[:, :, (1982 <= y) * (y <= 1999) * (m == 12)]
sst_2000_2019 = sst[:, :, (2000 <= y) * (y <= 2019) * (m == 12)]

sst_1982_1999_12mean = np.mean(sst_1982_1999[:, :, :], 2)
sst_2000_2019_12mean = np.mean(sst_2000_2019[:, :, :], 2)

vmin = -5
vmax = 35
vint = 5


cm = plt.get_cmap('seismic')
cs = plt.contourf(lon2, lat2, sst_1982_1999_12mean,
                    cmap=cm, norm=Normalize(vmin=vmin, vmax=vmax),
                    levels=np.arange(vmin, vmax + vint, vint), extend='both')
plt.colorbar(cs)
plt.xlabel('Longitude')
plt.ylabel('Latitude')
clim_title = '1982_1999平均 12月'
plt.title(clim_title)
plt.xlim(0, 360)
plt.ylim(-90, 90)
plt.show()

cm = plt.get_cmap('seismic')
cs = plt.contourf(lon2, lat2, sst_2000_2019_12mean,
                    cmap=cm, norm=Normalize(vmin=vmin, vmax=vmax),
                    levels=np.arange(vmin, vmax + vint, vint), extend='both')
plt.colorbar(cs)
plt.xlabel('Longitude')
plt.ylabel('Latitude')
clim_title = '2000_2019平均 12月'
plt.title(clim_title)
plt.xlim(0, 360)
plt.ylim(-90, 90)
plt.show()

2つの時期の差分を取ると

sst_diff_12mean = sst_2000_2019_12mean - sst_1982_1999_12mean
plt.hist(sst_diff_12mean.flatten(), bins=50)
plt.show()

min(sst_diff_12mean.flatten()), max(sst_diff_12mean.flatten())

(-4.157222150017818, 2.273055409060584)

差の最小値は-4.15で最大値は2.27だがヒストグラムを見るとごく稀なので、違いをはっきりさせるために、-2.0から2.0の範囲で０.２℃刻みでプロットした。

vmin = -2.0
vmax = 2.0
vint = 0.2

sst_diff_12mean = sst_2000_2019_12mean - sst_1982_1999_12mean

cm = plt.get_cmap('seismic')
cs = plt.contourf(lon2, lat2, sst_diff_12mean,
                    cmap=cm, norm=Normalize(vmin=vmin, vmax=vmax),
                    levels=np.arange(vmin, vmax + vint, vint), extend='both')
plt.colorbar(cs)
plt.xlabel('Longitude')
plt.ylabel('Latitude')
clim_title = '2000_2019平均 と 1982_1999平均の差分 12月'
plt.title(clim_title)
plt.xlim(0, 360)
plt.ylim(-90, 90)
plt.show()

海洋の全体で気温が上がっているが、細かく見ると沿岸部や南氷洋では下がっている海域もある。

C

札幌としてみた。またテキストではnumpyでcsvファイルを読み込んでいるが、csvファイルの読み込みではより便利なpandas::dataframeを用いた。

import numpy as np
import matplotlib.pyplot as plt
import japanize_matplotlib

import pandas as pd
df = pd.read_csv('drive/MyDrive/sapporo_temp.csv', skiprows=4, encoding='cp932')
df.head()

Unnamed: 0	Unnamed: 1	品質情報	均質番号	Unnamed: 4	現象なし情報	品質情報.1	均質番号.1
0	1872/1	NaN	0	1	NaN	NaN	0	1
1	1872/2	NaN	0	1	NaN	NaN	0	1
2	1872/3	NaN	0	1	NaN	NaN	0	1
3	1872/4	NaN	0	1	NaN	NaN	0	1
4	1872/5	NaN	0	1	NaN	NaN	0	1

必要な列のみに絞り、年月も扱いやすくy, mに分けている

df = df.iloc[:, [0, 1, 4]]
df.columns = ['年月', 'temp', 'rain']
df = df.assign(y = df['年月'].str.slice(0, 4))
df = df.assign(m = df['年月'].str.slice(5, 7))
df.tail()

これ以降はテキストの書式に合うように変数を加工

month = np.arange(1, 13, 1)
temp = df['temp']
rain = df['rain']
m = df['m']

temp_clim = np.zeros(12)
rain_clim = np.zeros(12)

for mm in range(1, 13):
  temp_clim[mm - 1] = np.nanmean(temp[m == str(mm)])
  rain_clim[mm - 1] = np.nanmean(rain[m == str(mm)])

fig, ax1 = plt.subplots()
ax2 = ax1.twinx()

ax1.bar(month, rain_clim, label = '降水量')
ax2.plot(month, temp_clim, color='red', label = '気温')


handler1, label1 = ax1.get_legend_handles_labels()
handler2, label2 = ax2.get_legend_handles_labels()

ax1.set_ylim([0, 150])
ax2.set_ylim([-10, 30])

ax1.legend(handler1 + handler2, label1 + label2, loc='upper left')
ax1.set_xlabel('月')
ax1.set_ylabel('降水量[mm]')
ax2.set_ylabel('気温[℃]')
plt.title('札幌の雨温図')
plt.show()

D

このファイルを用いた
- https://www.e-stat.go.jp/stat-search/files?page=1&layout=datalist&toukei=00500508&tstat=000001036919&cycle=8&tclass1=000001165626&tclass2=000001216640&stat_infid=000040171948&tclass3val=0
今回もPandasを用いた

import numpy as np
import matplotlib.pyplot as plt
import japanize_matplotlib

import pandas as pd


df = pd.read_excel('drive/MyDrive/gyunyu_doko_r4k.xls', sheet_name='5主要乳製品生産量', skiprows=3)
df = df.iloc[34:57, [0, 17]] # 月次データのある行のみ、年月とアイスクリーム生産量の列のみ
df.columns = ['年月', 'アイスクリーム生産量']
df

年月	アイスクリーム生産量
34	令和３年４月	12.151
35	５	11.201
36	６	13.164
37	７	13.75
38	８	13.028
39	９	12.428
40	10	10.939
41	11	12.81
42	12	9.415
43	令和４年１月	9.413
44	２	9.767
45	３	12.404
46	令和４年４月	11.46
47	５	11.954
48	６	13.142
49	７	14.626
50	８	13.306
51	９	13.03
52	10	11.464
53	11	12.692
54	12	8.376
55	令和５年１月	8.372
56	２	10.115

年月を取り扱いやすく加工

from dateutil.relativedelta import relativedelta

li_date = []
current_date = pd.to_datetime('2022-03-01')
for i in range(23):
  next_date = current_date + relativedelta(months=1)
  li_date.append(next_date.strftime('%Y-%m-%d'))
  current_date = next_date

df['年月'] = li_date
df['年月'] = pd.to_datetime(df['年月'])
df

年月	アイスクリーム生産量
34	2022-04-01	12.151
35	2022-05-01	11.201
36	2022-06-01	13.164
37	2022-07-01	13.75
38	2022-08-01	13.028
39	2022-09-01	12.428
40	2022-10-01	10.939
41	2022-11-01	12.81
42	2022-12-01	9.415
43	2023-01-01	9.413
44	2023-02-01	9.767
45	2023-03-01	12.404
46	2023-04-01	11.46
47	2023-05-01	11.954
48	2023-06-01	13.142
49	2023-07-01	14.626
50	2023-08-01	13.306
51	2023-09-01	13.03
52	2023-10-01	11.464
53	2023-11-01	12.692
54	2023-12-01	8.376
55	2024-01-01	8.372
56	2024-02-01	10.115

まずは単純に時系列グラフ

plt.plot(df['年月'], df['アイスクリーム生産量'])
plt.xlabel('年月')
plt.ylabel('アイスクリーム生産量[t]')
plt.show()

月次の違いを見るため、年ごとにプロット


values_2022 = [np.nan, np.nan, np.nan] + df[df['年月'] < '2023-01-01']['アイスクリーム生産量'].to_list()
values_2023 = df[(df['年月'] >= '2023-01-01') * (df['年月'] < '2024-01-01')]['アイスクリーム生産量'].to_list()
values_2024 = df[df['年月'] >= '2024-01-01']['アイスクリーム生産量'].to_list() + [np.nan] * 10

angles = np.linspace(0, 2 * np.pi, len(values_2022)+1) #radian
month = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12']

plt.plot(month, values_2022, label='2022')
plt.plot(month, values_2023, label='2023')
plt.plot(month, values_2024, label='2024')
plt.legend()
plt.xlabel('月')
plt.ylabel('アイスクリーム生産量[t]')
plt.title('月別アイスクリーム生産量')
plt.show()

考察
- 夏多く、冬少ないのは当たり前だが、３月、１１月の多さが気になる。また夏でも８月よりも７月の方が多いのはなぜか。
- 仮説として
  - イベントが関係しているか
    - ５/9がアイスの日、１１/15が冬アイスの日だがこれだと５月が説明つきにくい。
  - 商品投入時期が関係しているか
    - 夏に向け3月に新商品が多い？、冬アイスに向け１１月に新商品が多い？
    - 個人ブログだが新作アイス発売予定表｜毎日アイス生活¹によると2022年の新商品数は下記となっている。わずかだが3月、７月、１１月が前後の月に比べ多く、グラフの上下をサポートする結果となっている。
      - 1月: 17商品
      - 2月: 18商品
      - 3月: 22商品
      - 4月: 16商品
      - 5月: 18商品
      - 6月: 19商品
      - 7月: 24商品
      - 8月: 16商品
      - 9月: 23商品
      - 10月: 24商品
      - 11月: 25商品
      - 12月: 21商品
    - CM投入量、新商品の種類（新シリーズかマイナーチェンジか）等も検討するとより正確な検証になると思われる。

毎日アイス生活. "新作アイス発売予定表". 2024-06-02. https://icelog.net/%E6%96%B0%E4%BD%9C%E3%82%A2%E3%82%A4%E3%82%B9%E7%99%BA%E5%A3%B2%E4%BA%88%E5%AE%9A%E8%A1%A8/#201678, (参照 2024-06-16) ↩

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up