4
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

Pandasのread_fwfで固定長のデータを変換(震度データ)

Posted at

参考

日本の地震データを加工してわかりやすくしてみた
https://qiita.com/T_programming/items/2dae8f40941ff3581036

read_fwfで固定長のデータを変換

  • read_fwfだと全角文字を1文字と認識してしまうためずれてしまう
  • 「震央地名」が全角文字のためずれる
  • 「震央地名」以降を一旦全部取得後に分離する

ダウンロード

!wget https://www.data.jma.go.jp/svd/eqev/data/bulletin/data/shindo/i2019.zip

プログラム

import pandas as pd

widths = [1, 4, 2, 2, 2, 2, 4, 4, 3, 4, 4, 4, 4, 4, 5, 3, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 3, 28]

names = [
    "ヘッダー",
    "西暦",
    "",
    "",
    "",
    "",
    "",
    "標準誤差(秒)",
    "緯度(度)",
    "緯度(分)",
    "緯度標準誤差(分)",
    "経度(度)",
    "経度(分)",
    "経度標準誤差(分)",
    "深さ(km)",
    "標準誤差(km)",
    "マグニチュード1",
    "マグニチュード1種別",
    "マグニチュード2",
    "マグニチュード2種別",
    "使用走時表",
    "震源評価",
    "震源補助情報",
    "最大震度",
    "被害規模",
    "津波規模",
    "大地域区分番号",
    "小地域区分番号",
    "震央地名",
]

df = pd.read_fwf(
    "i2019.zip",
    encoding="cp932",
    header=None,
    widths=widths,
    names=names,
)

# 抽出

df1 = df[df["ヘッダー"].isin(["A", "B", "D"])].copy().reset_index(drop=True)

# 震央地名から「観測点数」と「震源決定フラグ」を分離

df1["観測点数"] = pd.to_numeric(df1["震央地名"].str[-6:-1].str.strip()).astype("Int64")
df1["震源決定フラグ"] = df1["震央地名"].str[-1]
df1["震央地名"] = df1["震央地名"].str[:-6].str.strip()

# 単位調整

df1[""] = df1[""].astype(float) / 100
df1["標準誤差(秒)"] = df1["標準誤差(秒)"].astype(float) / 100

df1["緯度(分)"] = df1["緯度(分)"].astype(float) / 100
df1["緯度標準誤差(分)"] = df1["緯度標準誤差(分)"].astype(float) / 100

df1["経度(分)"] = df1["経度(分)"].astype(float) / 100
df1["経度標準誤差(分)"] = df1["経度標準誤差(分)"].astype(float) / 100

df1["標準誤差(km)"] = df1["標準誤差(km)"].astype(float) / 100

df1["マグニチュード1"] = df1["マグニチュード1"].astype(float) / 10
df1["マグニチュード2"] = df1["マグニチュード2"].astype(float) / 10

# 日付変換

df_date = (
    df1[["西暦", "", "", "", "", ""]]
    .copy()
    .set_axis(["year", "month", "day", "hour", "minute", "seconds"], axis=1)
)

df1["datetime"] = pd.to_datetime(df_date)

df1.to_csv("2019.csv", encoding="utf_8_sig")
4
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?