More than 3 years have passed since last update.

ArangoDB で近傍地点検索

Last updated at 2021-06-02Posted at 2016-01-20

こんにちは。
地点の集合のデータ（約530万点）を、ArangoDB 2.8.7 に取り込み、近傍地点検索（k-NN, k=10）を試しました。ランダムに選んだ各点を中心とする近傍10点の検索は、1回当たりの平均検索実行時間はおよそ 0.1 ms と極めて速かったです。

処理は、MAPZEN の行政界データをダウンロードし、

$ wget https://s3.amazonaws.com/osm-polygons.mapzen.com/japan_geojson.tgz

この中の admin_level_7.geojson （約 160 MB 、multipolygon データ）の全頂点をjqを使って抽出し（約530万件）、ArangoDB に取り込み（メモリ使用約 1 GB）、Geo Index を付加し、検索しました。下記のようにこれらの処理はシンプルです¹（ただし検索に至るまでの処理時間を要しました）。

$ gzcat admin_level_7.geojson.gz | jq -c '.features[].geometry.coordinates| flatten(2) | {"coordinates":.[]}' | gzip > admin_level_7_line_by_line.geojson.gz 
  real  1m11.029s
$ gzcat admin_level_7_line_by_line.geojson.gz | wc -l
  5313284
$ gzcat admin_level_7_line_by_line.geojson.gz | head -n 3
{"coordinates":[135.7183011,34.9437886]}
{"coordinates":[135.7181462,34.9437805]}
{"coordinates":[135.718176,34.9444206]}
$ gzcat admin_level_7_line_by_line.geojson.gz | arangoimp --file - --collection=admin_level_7 --create-collection=true --type=json
  real  0m57.144s

検索実行：

$ arangosh
arangosh [_system]> db.admin_level_7.count()
  5313284
arangosh [_system]> db.admin_level_7.ensureIndex({type: "geo", fields:["coordinates"], geoJson: true})
  (およそ 20s かかった)
arangosh [_system]> db._query(`FOR x IN admin_level_7 SORT RAND() LIMIT 100000 LET coord = x.coordinates RETURN (FOR d in NEAR(admin_level_7, coord[1], coord[0], 10, \"distance\") RETURN d.distance)`).toArray().length
  100000
  (およそ 10s かかった)

「何が違うのか?PostGISと最新版MySQLのGIS機能を徹底比較 (pdf)」を、本稿（ArangoDB 利用）と見比べますと、PostGISもMySQLも検索に至るまでの前処理に知識が必要そうですね。 ↩

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up