Automating GIS Processes 2024 写経　Lesson 4（Aggregating data）

Last updated at 2025-07-14Posted at 2025-07-14

Data aggregation refers to a process where we combine data into groups. When doing spatial data aggregation, we merge the geometries together into coarser units (based on some attribute), and can also calculate summary statistics for these combined geometries from the original, more detailed values. For example, suppose that we are interested in studying continents, but we only have country-level data like the country dataset. If we aggregate the data by continent, we would convert the country-level data into a continent-level dataset.

デーの集約＝データをグループごとに結合することです。空間データの結合をする際は、geometry群をcoarser（粗いの比較級、つまり、よりおおまか）な単位（いくつかの属性に基づいたもの）に結合します。また、結合前のmore detailed（より詳細、coarserの対義として使っていると思われる）な値から、結合されたgeometry群の統計情報も取得できます。

例えば、大陸単位の研究をするとして、国単位のデータセットだけを持っていたとします。このデータセットを大陸事に集約することで、国単位のデータセットを大陸単位のデータセットにすることができます。

In this tutorial, we will aggregate our travel time data by car travel times (column car_r_t), i.e. the grid cells that have the same travel time to Railway Station will be merged together.

このチュートリアルでは、車での移動時間（car_r_t）毎の移動時間を集約します。言い換えると、駅への移動時間が同じ値の行をマージします。

Let’s start with loading intersection.gpkg, the output file of the previous section:

まずは、前回作成したintersection.gpkgを読み込みましょう。

import pathlib
NOTEBOOK_PATH = pathlib.Path().resolve()
DATA_DIRECTORY = NOTEBOOK_PATH / "data"

! ls data/intersection.gpkg

import geopandas
intersection = geopandas.read_file(DATA_DIRECTORY / "intersection.gpkg")

For doing the aggregation we will use a method called dissolve() that takes as input the column that will be used for conducting the aggregation:

集約するために、dissolve()というメソッドを使います。dissolve()の引数には、集約対象の列を設定します。

# Conduct the aggregation
dissolved = intersection.dissolve(by="car_r_t")

# What did we get
dissolved.head()

Let’s compare the number of cells in the layers before and after the aggregation:

集約前後の行数を比較してましょう。

print(f"Rows in original intersection GeoDataFrame: {len(intersection)}")
print(f"Rows in dissolved layer: {len(dissolved)}")

Indeed the number of rows in our data has decreased and the Polygons were merged together.
What actually happened here? Let’s take a closer look.
Let’s see what columns we have now in our GeoDataFrame:

集約後の行数は減少しており、ポリゴンもマージされています。では何が起こったのか、もう少し細かく見ましょう。集約後のGeoDataFrameの列構成を見ましょう。

dissolved.columns

As we can see, the column that we used for conducting the aggregation (car_r_t) can not be found from the columns list anymore. What happened to it?
Let’s take a look at the indices of our GeoDataFrame:

ご覧の通り、集約に使った列（car_r_t）は列リストには存在しません。何が起こったのでしょうか？
では集約後のGeoDataFrameのインデックスを見ましょう。

dissolved.index

Aha! Well now we understand where our column went. It is now used as index in our dissolved GeoDataFrame.

なんと！car_r_t列がどこに行ったのか、これでわかりました。car_r_tは、dissolve()メソッドで作成されたGeoDataFrameのインデックスになっています。

Now, we can for example select only such geometries from the layer that are for example exactly 15 minutes away from the Helsinki Railway Station:

たとえば、ヘルシンキ駅からちょうど15分の場所のgeometry群だけを選択できます。
（loc()メソッドで、indexが15（移動距離が15分）のデータを選択します。）

# Select only geometries that are within 15 minutes away
dissolved.loc[15]

# See the data type
type(dissolved.loc[15])

As we can see, as a result, we have now a Pandas Series object containing basically one row from our original aggregated GeoDataFrame.

ご覧の通り、結果は、PandasのSeriesオブジェクトです。基本的に、集約されたGeoDataFrameの中の１行になります。

Let’s also visualize those 15 minute grid cells.
First, we need to convert the selected row back to a GeoDataFrame:

では、移動時間15分のグリッドを可視化しましょう。
まずは、上記で選択した１行をSeriesオブジェクトからGeoDataFrameに戻します。

# Create a GeoDataFrame
selection = geopandas.GeoDataFrame([dissolved.loc[15]], crs=dissolved.crs)

折角なので、sectionのタイプを見てみましょう。GeoDataFrameに戻っていることが確認できます。

Plot the selection on top of the entire grid:

次に、dissolve()メソッドで作成されたGeoDataFrameに重畳する形で、移動時間15分のグリッドをプロットします。

# Plot all the grid cells, and the grid cells that are 15 minutes
# away from the Railway Station
ax = dissolved.plot(facecolor="gray")
selection.plot(ax=ax, facecolor="red")

Another way to visualize the travel times in the entire GeoDataFrame is to plot using one specific column. In order to use our car_r_t column, which is now the index of the GeoDataFrame, we need to reset the index:

GeoDataFrame全体の移動時間を可視化する別の方法は、特定の列をプロットすることです。今はGeoDataFrameのインデックスになっているcar_r_tを列として使うために、インデックスをリセットします。

dissolved = dissolved.reset_index()
dissolved.head()

As we can see, we now have our car_r_t as a column again, and can then plot the GeoDataFrame passing this column using the column parameter:

ご覧の通り、car_r_tが列になりました。なので、car_r_t列を引数columnに指定してプロットします。

dissolved.plot(column="car_r_t")

うーん、ちょっとわかりにくいですね。
先ほどプロットしたselectionを重畳してましょう。

ax = dissolved.plot(column="car_r_t")
selection.plot(ax=ax, facecolor="red")

How Are Other Columns Aggregated During `dissolve`?

他の列はdissolveメソッドでどう集約される？

When using the dissolve method in GeoPandas (e.g., dissolved = intersection.dissolve(by="car_r_t")), here’s how other columns are aggregated:

dissolveメソッド使用時、他の列は以下のように集約されます。

Default Behavior:

デフォルト

Default Aggregation Function: aggfunc='first'

Keeps the first value from each group for columns that are not involved in the aggregation (i.e., not the by column).

For multiple rows grouped together, only the first row’s values are retained for other columns.

aggfunc='first'の場合、集約対象でない列については、集約対象でない列については、各グループ内の最初の1行の値が保持されます。複数行が1つのグループに集約された場合、2行目以降の値は保持されません。

Custom Aggregation:

カスタム

You can control how other columns are aggregated using the aggfunc parameter:

aggfuncパラメータにより、集約対象でない列の集約の仕方を制御できます。

dissolved_sum = intersection.dissolve(by="car_r_t", aggfunc="sum").reset_index()
dissolved_sum.head()

Supported aggregation functions include: - "sum": Sum of the values in the group. - "mean": Average of the values in the group. - "min": Minimum value in the group. - "max": Maximum value in the group. - "first": First value in the group (default). - "last": Last value in the group. - Custom aggregation using a lambda function.

"sum": 総和
"mean": 平均
"min": 最小値
"max": 最大値
"first": 最初の値（デフォルト）
"last": 最後の値
カスタム: lambda関数の結果

Using Multiple Aggregations:

複数の集約

To apply different aggregations to different columns, you can do further aggregation manually:

異なる列に対する異なる集約もできます。

dissolved_multiple = intersection.groupby("car_r_t").agg({
    "car_m_d": "sum",
    "car_m_t": "mean",
    "car_r_d": "min",
    "from_id": "max",
    "pt_m_d": "first",
    "pt_m_t":"last",
    "geometry": lambda x:x.iloc[0]
})
dissolved_multiple.head()

Geometry Aggregation:

geometryの集約

The geometries in the grouped rows are merged (unioned) into a single geometry for each group.

集約された行のgeometryは、一つのgeometryにマージ（統合）されます。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Automating GIS Processes 2024 写経 Lesson 4（Aggregating data）

How Are Other Columns Aggregated During dissolve?