Automating GIS Processes 2024 写経　Exercise 4（Problem 1　前半）

Last updated at 2025-08-09Posted at 2025-08-09

Problem 1: Join accessibility datasets into a grid and visualise the data on a map (10 points)

アクセサビリティ（通いやすさ）のデータセットをグリッドに結合して、地図上で可視化してください（10点）

Your task in problem 1 is to combine data from non-spatial data sets (travel times between places) and a spatial data set (grid cells that represent the places), and plot the combined data set to visualise the travel times to two shopping centres from every other place in the Helsinki metropolitan area.

ここでのタスクは、ヘルシンキ中心部のいろいろな場所からショッピングセンター（２つ）までの移動時間を可視化するために、非空間データセット（ある場所から別の場所への移動時間）と空間データセット（場所をしめすグリッドセル）を結合し、結合したデータセットをプロットすることです。　

In particular, this task comprises of three major steps:

このタスクは３つの主なステップがあります。

1.Read the grid cell data set
2.Read a travel time data set,

discard unnecessary columns,

rename the relevant columns to include a reference to the respective shopping centre,

join the relevant columns to the grid data set

3.Classify the travel times for both travel modes (public transport and private car) into five-minute intervals

１．グリッドセルのデータをセットを読み込む
２．移動時間のデータセットを読み込む
　・不要な列を削除する
　・それぞれのショッピングセンターに対する値であることがわかるように、関連する列の名前を変える
　・関連する列をグリッドセルに結合する
３．両方の移動手段（公共交通機関と乗用車）の移動時間を、５分間隔で分類する

Repeat the second step for each of the two shopping centres (Itis, Myyrmanni).

２つのショッピングセンター（Itis、Myyrmanni）それぞれで、２番目のステップを繰り返す　

a) Read the grid cell data set (1 points)

グリッドセルのデータセットを読み込む（1点）

The grid cells are derived from the ‘YKR’ data set, that is published by the Finnish Environmental Institute (SYKE) and collects a variety of indicators relating to the social and built-up structure of the country. In an effort to harmonise different data products of other institutions, the YKR grid cell data set has become a reference for many data products, including, for instance, the travel time data set produced at the Digital Geography Lab.

グリッドセルはYKRデータセットから発生してします。これはフィンランド環境研究所（SYKE）から発行されていて、社会や国の構造物に関連する様々な種類の指標を収集しています。
他の研究所の異なるデータとの調和を図るために、YKRグリッドセルのデータセットは、他のデータから参照されています。例えば、Digital Geography Labで作成された移動時間のデータセットなどです。　

You can find the YKR data set in the directory data in GeoPackage format: YKR_grid_EPSG3067.gpkg. It contains a polygon geometry column, and an (integer) identifier: YKR_ID.

dataディレクトリにGeoPackageフォーマットのYKRデータセット：YKR_grid_EPSG3067.gpkgがあります。
このファイルには、IDであるYKR_ID列と、ポリゴンのgeometryの列があります。

では実践です。
まずは、dataディレクトリのパスを定義します。

import pathlib
NOTEBOOK_PATH = pathlib.Path().resolve()
DATA_DIRECTORY = NOTEBOOK_PATH / "data"

dataディレクトリにファイルを格納します。

ファイルはここにあります。

念のため、geopandasをインストールします。たまに入っていないときがあるので。理由はわかりません。

! pip install geopandas

ファイルを読み込み、最初の5行を表示します。

import geopandas as gpd

grid_base : gpd.geopandas = gpd.read_file(DATA_DIRECTORY / "YKR_grid_EPSG3067.gpkg")

grid_base.head()

確かに、YKR_ID列・geometryの列があります。
crs（Coordinate Reference System、座標参照系）も見てみましょう。

grid_base.crs

EPSGは3067です。

行数も見てみましょう。

len(grid_base)

13231行あります。

b) Read the travel time data sets and join them to the grid cells (2 points)

移動時間データセットを読み込み、グリッドセルに結合する（2点）

Inside the data directory, you will find a set of semicolon-separated text files with travel times to each of a set of shopping centres in the Helsinki region (this exercise was conceived before REDI and Tripla started operation).

dataディレクトリの中に　ヘルシンキの各々のショッピングセンターへの移動時間のセミコロン区切りのテキストファイル群があります。（この演習はREDIとTriplaが事業を開始する前に思いつきました）

The individual files have file names following the schema travel_times_to_[XXXXX]_[Shopping_Centre], where [Shopping_Centre] is the name of one of the seven shopping centres included in the data set, and [XXXXX], coincidently, referring to the YKR grid cell in which the shopping centre is located (although you should not need to use this ID in this exercise).

個々のファイルはtravel_times_to_[XXXXX]_[Shopping_Centre]という枠組みに沿ったファイル名を持っています。[Shopping_Centre]は、そのデータセットが対応する7つのショッピングセンターの名前です。そして、[XXXXX]は、たまたま、ショッピングセンターが位置するYKRグリッドセルを参照しています。（この演習でこのIDを使う必要はないですが）

The data sets contain computed travel times between different places in the metropolitan area. In order to produce such a travel time matrix, all connections from all origins to all destinations are calculated, and then recorded in a table.

データセットは中心地域の異なる場所間の移動時間を含みます。移動時間のマトリックスを作るために、始点から終点まで、すべての組み合わせの移動時間が保存されています。

Columns

The data sets we use have many columns, but only a few are interesting for this task:

from_id: the YKR_ID of the origin grid cell

to_id: refers to the YKR_ID of the destination grid cell (here: the one containing the shopping centre).

pt_r_t: how long does it take to travel from from_id to to_id, in minutes, using public transport?

car_r_t: how long does it take to drive a car from from_id to to_id, in minutes?

私たちが使うデータセットにはたくさんの列があります。しかし、いくつかの列のみ使用します。

from_id: グリッドセルの起点のYKR_ID
to_id: YKR_IDの参照値　グリッドセルの終点（これはそのショッピングセンターのYKR_IDなので、どの行も同じ値です）
pt_r_t: 公共交通機関でfrom_idからto_idまで何分がかかるか　
car_r_t: 乗用車でfrom_idからto_idまで何分がかかるか

Note that from_id and to_id relate to the YKR grid data set’s YKR_ID column. Each input data set has only one unique to_id, as the data has been split up to relate to one destination (a shopping centre), only, but many unique values for from_id, as it covers the travel times from anywhere in the metropolitan area.

from_idやto_idは、YKR gridデータセットのYKR_ID列に関連します。
各々のデータセットは　ただ１つのto_idをもちます。１つの終点（１つのショッピングセンター）に関連するように、データセットは分割されています。
一方、from_idはそれぞれ異なる値です。中心部のあちらこちらからの移動時間を網羅します。

（補足）各データセットファイルは同じルールに基づいています。どの行もto_idは同じ値で、対象のショッピングセンターのYKR_IDです。一方、from_idは各行で異なる値が入っています。ヘルシンキ中心部のいろいろな場所のYKR_IDです。これにより、「どこか」から「そのショッピングセンター」までの移動時間を格納することができます。

No-data values

データが存在しないことを示す値について

The travel time data set contains some origin-destination pairs (O/D-pairs) for which it could not find a public transport connection, or which are not accessible by car. Such no-data values are saves as -1 minutes travel time. Use the pandas.Series.replace() method to replace -1 with numpy.nan to indicate that these cells do not contain valid data.

移動時間のデータセットは起点と終点のペア（ODペア）を含みます。　

ODペアには、公共交通機関の交通ネットワークでは行くことができない　もしくは、車では行くことができないものもあります。そのようなデータの場合、移動時間は-1分として設定されています。
-1だと不都合なので、それらのデータはpandas.Series.replace()メソッドで-1をnumpy.nan（非数）に変換してください。　

以下はpandasのバグの回避策の説明ですが、このバグは既に修正されているようです。

IMPORTANT: While we are having this course, a bug (https://github.com/pandas-dev/pandas/issues/45725) prevents pandas.Series.replace() from working as expected: the line travel_times["car_r_t"] = travel_times["car_r_t"].replace(-1, numpy.nan) fails with a RecursionError. There is a workaround: using an alternative syntax, using a dict of before and after values, similar to how pandas.DataFrame.rename() works, does not trigger the issue. travel_times["car_r_t"] = travel_times["car_r_t"].replace({-1: numpy.nan}) works.

重要： 以下のバグにより、 pandas.Series.replace()が期待通り動作しません。

travel_times["car_r_t"] = travel_times["car_r_t"].replace(-1, numpy.nan)

これはRecursionErrorになります。回避策として、変換前後の値をdict型で定義します。pandas.DataFrame.rename()と似たようなやり方です。こうであれば、エラーは発生しません。

travel_times["car_r_t"] = travel_times["car_r_t"].replace({-1: numpy.nan})

Read the data sets for the shopping centres ‘Itis’ and ‘Myyrmanni’, discard irrelevant columns, rename the pt_r_t and car_r_t columns to include a reference to the shopping centre (e.g., into pt_r_t_Itis), and join the renamed columns to the grid data frame. Don’t forget to replace no-data values (-1) with None.

ショッピングセンター‘Itis’と‘Myyrmanni’のデータセットを読んで、関連しない列を削除し、どのショッピングセンターに関連するのかがわかるように、pt_r_t列・car_r_t列の名前を変更してください。（例えば、pt_r_t_Itis）そして、gridデータセットと結合してください。
データが存在しないことを示す値である-1をNoneに変換するを忘れずに。

では実践です。
まずは、pandasとnumpyをインポートします。numpyは、上記のnumpy.nanを設定するために必要です。

import pandas as pd
import numpy as np

まずは、‘Itis’のデータを読み込みます。

セミコロン区切りのデータなので、sep=";"とする必要があります。あと、読み込むついでに、[["from_id", "pt_r_t", "car_r_t"]]と、列の間引きも行います。

Itis_df = pd.read_csv(DATA_DIRECTORY / "travel_times_to_5944003_Itis.txt", sep=";")[["from_id", "pt_r_t", "car_r_t"]]
Itis_df

grid_base（グリッドセルのデータセット）と同じく、13231行あります。
また、インデックス：13228（下から３行目）のpt_r_t・car_r_tが-1になっています。

では、-1をnumpy.nan（非数）に変換しましょう。

Itis_df["pt_r_t"] = Itis_df["pt_r_t"].replace(-1, np.nan)
Itis_df["car_r_t"] = Itis_df["car_r_t"].replace(-1, np.nan)

Itis_df.iloc[13228]

インデックス：13228のpt_r_t列・car_r_t列が-1からNaNになっています。

pt_r_t列・car_r_t列を個別にみてみましょう。

Itis_df.iloc[13228].pt_r_t

Itis_df.iloc[13228].car_r_t

いずれれの列もnp.float64(nan)となっています。

同じことを、’Myyrmanni’でも実施します。

Myyrmanni_df = pd.read_csv(DATA_DIRECTORY / "travel_times_to_5902043_Myyrmanni.txt", sep=";")[["from_id", "pt_r_t", "car_r_t"]]

Myyrmanni_df

-1をnumpy.nan（非数）に変換します。

Myyrmanni_df["pt_r_t"] = Myyrmanni_df["pt_r_t"].replace(-1, np.nan)
Myyrmanni_df["car_r_t"] = Myyrmanni_df["car_r_t"].replace(-1,np.nan)

Myyrmanni_df.iloc[13228]

インデックス：13228のpt_r_t列・car_r_t列が-1からNaNになっています。

ファイルの読み込みが終わりましたので結合します。結合のメソッドはjoin()とmerge()があります。
今回は、join()を使います。インデックスベースでの結合をするためです。

また、結合の前に以下の加工をします。

grid_base：YKR_ID列をインデックスにします
Itis_df：from_id列（YKR_IDの外部参照）をインデックスにして、それ以外の列の末尾に"_Itis"を追加します
Myyrmanni_df：from_id列（YKR_IDの外部参照）をインデックスにして、それ以外の列の末尾に"_Myyrmanni"を追加します

grid= grid_base.set_index("YKR_ID"  # YKR_ID列をインデックスに
    ).join(
        Itis_df.set_index("from_id").add_suffix("_Itis")  # from_id列インデックスにして、列の末尾に_Itisを追加
    ).join(
        Myyrmanni_df.set_index("from_id").add_suffix("_Myyrmanni") # from_id列インデックスにして、列の末尾に_Myyrmanniを追加
    )

grid

結果確認です。問題なしです。

# NON-EDITABLE TEST CELL
import geopandas
assert type(grid) == geopandas.geodataframe.GeoDataFrame, "Output should be a geodataframe."

# NON-EDITABLE TEST CELL
# Check that the merged output have (at least) the necessary columns
required_columns = ['pt_r_t_Itis', 'car_r_t_Itis', 'pt_r_t_Myyrmanni', 'car_r_t_Myyrmanni', 'geometry']

assert all(column in grid.columns for column in required_columns), "Couldn’t find all required columns."

# NON-EDITABLE TEST CELL
# Check that -1 values are not present in the columns
for shopping_centre in ("Itis", "Myyrmanni"):
    for column in ("car_r_t", "pt_r_t"):
        assert -1 not in grid[f"{column}_{shopping_centre}"], "NoData values (-1) should be removed from the data!"

長くなったので、いったんここまでにします。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Automating GIS Processes 2024 写経 Exercise 4（Problem 1 前半）