More than 3 years have passed since last update.

日本の地震データを加工してわかりやすくしてみた

Last updated at 2021-11-02Posted at 2021-11-02

日本の地震データ

今回は気象省から出ている地震のデータを整形していく。
https://www.data.jma.go.jp/svd/eqev/data/bulletin/shindo.html

フォーマットはこれで、これをみながら整形を進めていく

https://www.data.jma.go.jp/svd/eqev/data/bulletin/data/shindo/format_j.pdf

# データ読み込み
import pandas as pd
df=pd.read_csv('dat/i2019.dat', sep=';', encoding='shift-jis')

# データフレーム出力
df

A2019010104042854 018 432000 061 1460541 097 512009832V   5111  1 27根室半島南東沖            1K
A2019010106533565 020 354111 057 1404558 082 466514635V   5111  3110千葉県東方沖              2K
A2019010110014516 012 320410 031 1315008 065 340410431V   5111  7300日向灘                    1K
A2019010118455550 012 241154 057 1241409 057 144816542D38V5111  7290石垣島近海                3K
A2019010118551331 014 241096 061 1241460 061 144819642D40V5111  7290石垣島近海                5K
A2019010200095642 011 385340 030 1420462 064 445210036V   5111  2 66宮城県沖                  1K
A2019010207053808 020 354396 049 1401202 072 658116936V   5111  3 92千葉県北西部             12K
A2019010217471388 010 364256 027 1405714 057 321311938D38V5112  3111茨城県沖                 19K
A2019010218442606 011 361049 030 1400676 044 513808839D39V5112  3 88茨城県南部              141K
A2019010221255892 013 400770 027 1422849 068 336211638D40V5111  2 61岩手県沖                  5K
A2019010223243815 011 385129 031 1420287 070 452111336V   5111  2 66宮城県沖                  7K
A2019010317244867 004 332806 013 1321788 013 473103732V   5111  6244愛媛県南予                8K
A2019010317490340 015 365093 040 1411969 087 430613336V   5111  3111茨城県沖                  1K

こんな感じのよくわからないデータ😭

まず、根室半島南東沖のところから、二つの違うところからとったデータが一緒に入っていることが推測できる。
formatから、根室半島の行の最初の文字はアルファベットなので、
最初の文字がアルファベットの行だけ表示させる

# 先頭が数字の行のインデックスを格納
drop_index=[]
for i in range(len(df)):
    if df['index'][i][0].isdecimal():
        drop_index.append(i)
# 数字の行のインデックスを削除
df = df.drop(drop_index)

# df表示
df

A2019010104042854 018 432000 061 1460541 097 512009832V   5111  1 27根室半島南東沖            1K
A2019010106533565 020 354111 057 1404558 082 466514635V   5111  3110千葉県東方沖              2K
A2019010110014516 012 320410 031 1315008 065 340410431V   5111  7300日向灘                    1K
A2019010118455550 012 241154 057 1241409 057 144816542D38V5111  7290石垣島近海                3K
A2019010118551331 014 241096 061 1241460 061 144819642D40V5111  7290石垣島近海                5K
A2019010200095642 011 385340 030 1420462 064 445210036V   5111  2 66宮城県沖                  1K
A2019010207053808 020 354396 049 1401202 072 658116936V   5111  3 92千葉県北西部             12K
A2019010217471388 010 364256 027 1405714 057 321311938D38V5112  3111茨城県沖                 19K
A2019010218442606 011 361049 030 1400676 044 513808839D39V5112  3 88茨城県南部              141K
A2019010221255892 013 400770 027 1422849 068 336211638D40V5111  2 61岩手県沖                  5K
A2019010223243815 011 385129 031 1420287 070 452111336V   5111  2 66宮城県沖                  7K
A2019010317244867 004 332806 013 1321788 013 473103732V   5111  6244愛媛県南予                8K
A2019010317490340 015 365093 040 1411969 087 430613336V   5111  3111茨城県沖                  1K
A2019010318102764 002 330164 008 1303326 009 104403251D49W511C2 7264熊本県熊本地方          522K
A2019010318194870 002 330161 008 1303296 009 106703324V   5111  7264熊本県熊本地方            3K
A2019010318485456 002 330156 007 1303356 008 110903132V   5112  7264熊本県熊本地方           17K
A2019010320282439 015 372166 032 1412666 081 346012138D38V5111  2 69福島県沖                  6K
A2019010409571538 012 321168 030 1315757 055 196014329V   5111  7300日向灘                    1K

次に各カラムを作り、そこに格納していく。
今回はデータの中で意味のありそうな
['レコード種別ヘッダー','時刻','緯度','経度','深さ','マグニチュード','最大震度','被害規模','津波規模','大地域区分番号','小地域区分番号','震央地名','観測点数']

この１４のカラムをデータフレームに格納していきたい

その時、このformatに従い、インデックスを決めていく

# データフレームに格納
for i in range(len(df)):
    df['レコード種別ヘッダー'][i]=df['index'][i][0]
    df['時刻'][i]=df['index'][i][1:13]
    df['緯度'][i]=df['index'][i][22:28]
    df['経度'][i]=df['index'][i][33:40]
    df['深さ'][i]=df['index'][i][45:49]
    df['マグニチュード'][i]=df['index'][i][53:54]
    df['最大震度'][i]=df['index'][i][61:62] #最大震度
    df['被害規模'][i]=df['index'][i][62:63]
    df['津波規模'][i]=df['index'][i][63:64]
    df['大地域区分番号'][i]=df['index'][i][64:65] #大地域区分番号
    df['小地域区分番号'][i]=df['index'][i][65:68] #小地域区分番号
    df['震央地名'][i]=df['index'][i][68:84] #震央地名
    df['観測点数'][i]=df['index'][i][85:95]

このようにすることで、各値を各カラムに格納することができる

最後に時刻を直して、終わり

# 時刻の数字の列をdatetime型に格納する
list_dt=[]
for i in range(len(df)):
    dt=datetime.datetime(int(str(df['時刻'][i])[0:4]),int(str(df['時刻'][i])[4:6]),int(str(df['時刻'][i])[6:8]),\
                     int(str(df['時刻'][i])[8:10]),int(str(df['時刻'][i])[10:12]))
    list_dt.append(dt)
df['dt']=list_dt

# csvファイルを保存
df.to_csv('earthquake_full.csv')

プログラムと結果と20年分のcsvファイルはgithubに置いてあります。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up