Automating GIS Processes 2024 写経　Exercise 2 (Problem 3、Lesson 2振り返り)

Posted at 2025-05-31

During this task, the aim is to calculate the (air-line) distance in meters that each social media user in the data set prepared in Problem 2 has travelled in-between the posts. We’re interested in the Euclidean distance between subsequent points generated by the same user.

For this, we will need to use the userid column of the data set kruger_posts.shp that we created in Problem 2.

Answer the following questions:

What was the shortest distance a user travelled between all their posts (in meters)?

What was the mean distance travelled per user (in meters)?

What was the maximum distance a user travelled (in meters)?

このタスクでは、air-line distance、直線距離を計算します。単位はmです。

Problem 2のデータセットから、投稿と投稿の間に移動した距離、ユーザーごとのEuclidean distance、ユークリッド距離（空間内の2点間の直線距離）を求めることができます。
今回の問いは以下です。

ユーザーの最小移動距離(m)は？
ユーザーあたりの平均移動距離(m)はどのくらいか?
ユーザーの最大移動距離(m)は？

a) Read the input file and re-project it

Read the input file kruger_points.shp into a geo-data frame kruger_points
Transform the data from WGS84 to an EPSG:32735 projection (UTM Zone 35S, suitable for South Africa). This CRS has metres as units.

import geopandas as gpd
import pathlib
DATA_DIRECTORY = pathlib.Path().resolve() / "data"

kruger_points=gpd.read_file(DATA_DIRECTORY / "kruger_points.shp")

# Check the data
kruger_points.head()

kruger_points.shpをGeoDataFrameに読み込み、CRSをWGS84から南アフリカに適したEPSG:32735、UTM Zone 35Sに変換しましょう。 EPSG:32735（というか投影座標系）の距離の単位はmです。

# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION
kruger_points.crs = "EPSG:32735"

# Check that the crs is correct after re-projecting (should be epsg:32735)
import pyproj
assert kruger_points.crs == pyproj.CRS("EPSG:32735")

南アフリカは歪みを最小にするために4つの UTM ゾーンに覆われています。これらのゾーンは、 UTM 33S 、 UTM 34S 、 UTM 35S 、 UTM 36S と呼ばれています。ゾーンの後の S は、UTMゾーンが赤道より南に位置していることを意味します。

b) Group the data by user id

Group the data by userid and store the grouped data in a variable grouped_by_users

データをユーザーIDでグループ化し、変数grouped_by_usersに保存します。
また、グループの数とnunique()メソッドで求めたユニークなユーザーIDの数が一致しているか、assertで確認します。

grouped_by_users = kruger_points.groupby("userid")

# Check the number of groups:
assert len(grouped_by_users.groups) == kruger_points["userid"].nunique(), "Number of groups should match number of unique users!"

c) Create shapely.geometry.LineString objects for each user connecting the points from oldest to most recent

ユーザーごと、Point群から（時系列が古いデータから新しいデータに並びかえて）LineStringオブジェクトを作成しましょう。

There are multiple ways to solve this problem (see the hints for this exercise. You can use, for instance, a dictionary or an empty GeoDataFrame to collect data that is generated using the steps below:

Use a for-loop to iterate over the grouped object. For each user’s data:

sort the rows by timestamp

create a shapely.geometry.LineString based on the user’s points

CAREFUL: Remember that every LineString needs at least two points. Skip users who have less than two posts.

やりかたはいくつかあります。（Hintを参照）

たとえば、dictonaryや空のGeoDataFrameを使用するやり方です。グループごとにループで処理、ループ内ではタイムスタンプでソートし、Point群からLineStringオブジェクトを生成します。

気を付けるべき点としては、LineStringは少なくともPointが２つ必要だということです。つまりPointが一つしかないユーザー（＝移動してないユーザー）の場合、LineStringの生成をスキップする必要があります。

Store the results in a geopandas.GeoDataFrame called movements, and remember to assign a CRS.

では、GeoDataFrame型の変数movementsに格納した結果を見てみましょう。CRSの設定は忘れずに。

from shapely.geometry import LineString

movements_list = []

for key, group in grouped_by_users:

     points = group.sort_values(by="timestamp").geometry
     # Pointが一つの場合はスキップ
     if len(points) > 1:
        movements_list.append(LineString(points))

movements  = gpd.GeoDataFrame(movements_list, columns=["geometry"])
movements.crs = "EPSG:32735"

# Check the result
print(type(movements))
print(movements.crs)

movements

d) Calculate the distance between all posts of a user

Check once more that the CRS of the data frame is correct

Compute the lengths of the lines, and store it in a new column called distance

CRSが正しいか確認します。また、LineStringオブジェクトの距離を計算し、その結果をdistance列に格納します。

movements["distance"]= movements.apply(lambda row: row.geometry.length, axis=1)

# Check the result
movements.head()

e) Answer the original questions

You should now be able to quickly find answers to the following questions:

What was the shortest distance a user travelled between all their posts (in meters)? (store the value in a variable shortest_distance)

What was the mean distance travelled per user (in meters)? (store the value in a variable mean_distance)

What was the maximum distance a user travelled (in meters)? (store the value in a variable longest_distance)

ここまでくれば、問いに対する答えを簡単に出すことができます。

ユーザーの最小移動距離(m)は？ →変数shortest_distanceに
ユーザーあたりの平均移動距離(m)はどのくらいか?　→変数mean_distanceに
ユーザーの最大移動距離(m)は？　→変数longest_distanceに

shortest_distance = movements["distance"].min()
mean_distance = movements["distance"].mean()
longest_distance = movements["distance"].max()

print(f"shortest_distance: {shortest_distance}")
print(f"mean_distance: {mean_distance}")
print(f"longest_distance: {longest_distance}")

最小は0m（移動していない）
平均は約1m
最大は約64mという結果になりました。

本当でしょうか。

まず、describe()メソッドで統計情報を見てみましょう。

movements.describe()

50%は中央値（median）です。それが0.14m。つまりほとんど移動していません。
何となく想像はできますが、ヒストグラムにしてみましょう。最大値が約64mなので、ビン（棒の幅）はざっくりと65にしています。

movements.hist(column="distance", bins=65)

うん、ほとんどの人が移動してないですね。

f) Save the movements in a file

Save the movements into a new Shapefile called movements.shp inside the data directory.

変数movementsをmovements.shpファイルとして保存します。

movements.to_file(DATA_DIRECTORY / "movements.shp")

ファイルが書き込めているかどうかを、shellコマンドで確認します。

これで終わりです。

Lesson 2振り返り

Learning goals
After this week, you should be able to:

read and write spatial data from and to common file formats,

filter and re-group data by spatial and non-spatial characteristics, and

manage and transform a data set’s coordinate reference system.

一般的なファイルフォーマットで空間データを読み書きします →できました
データをフィルタしたり、再度グループ化します →できました
データの座標参照系を変換します →できました

次は Lession 3です。

5月ひと月で、Lesson1とLesson2を消化しました。
が、内容はだんだん難しくなっていくので、Lession 3は6月いっぱいかかるかもしれません。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Automating GIS Processes 2024 写経 Exercise 2 (Problem 3、Lesson 2振り返り)