こちらの記事で説明されているサンプルノートブックのウォークスルーです。
ノートブックの翻訳はこちらにあります。
クラスターの準備
こちらで説明されている様に、Databricksランタイム11.2以降かつPhotonのアクセラレーションを有効にします。Photonは機械学習ランタイムをサポートしてないため、機械学習ランタイムは使用できません。
データエンジニアリング
こちらのノートブックを使用してH3のインデクシングがされたデータを準備します。
データベースを作成します。データベース名は適宜変更してください。
%sql CREATE DATABASE IF NOT EXISTS h3_prod_api_demo;
USE h3_prod_api_demo;
トリップデータのセットアップ
タクシーのデータをセットアップします。
// Load NYC Taxi trip data as `tripsTable`
val tripsTable = spark.table("delta.`/databricks-datasets/nyctaxi/tables/nyctaxi_yellow`")
SQLを含む他の言語でデータフレームを使える様に一時ビュー trips
を作成します。
このビューはこのノートブックでのみ使用します。
tripsTable.createOrReplaceTempView("trips")
解像度12のH3セルにNYCタクシー移動の乗降位置をマッピングします。
h3_longlatash3(pickup_longitude, pickup_latitude, 12) as pickup_cell
h3_longlatash3(dropoff_longitude, dropoff_latitude, 12) as dropoff_cell
%sql -- drop table if exists trips_h3;
create table if not exists trips_h3 as (
select
h3_longlatash3(pickup_longitude, pickup_latitude, 12) as pickup_cell,
h3_longlatash3(dropoff_longitude, dropoff_latitude, 12) as dropoff_cell,
passenger_count,
fare_amount
from
trips
)
情報を得るために、H3セルにエンコードした移動の数をカウントします。
このデータには16億の移動が含まれています。
%sql
select
format_number(count(*),0) as count
from
trips_h3
H3解像度15で位置が計算されるtrips_h3_15
テーブルを新たに作成しましょう。
h3_longlatash3(pickup_longitude, pickup_latitude, 15) as pickup_cell
h3_longlatash3(dropoff_longitude, dropoff_latitude, 15) as dropoff_cell
%sql -- drop table if exists trips_h3_15;
create table if not exists trips_h3_15 as (
select
h3_longlatash3(pickup_longitude, pickup_latitude, 15) as pickup_cell,
h3_longlatash3(dropoff_longitude, dropoff_latitude, 15) as dropoff_cell,
passenger_count,
fare_amount
from
trips
)
タクシーゾーンのGeoJSONのセットアップ
Databricks LabsのプロジェクトMosaicの機能を用いることで、Sparkを用いて容易にGeoJSONをロードすることができます。
%python
user_name = dbutils.notebook.entry_point.getDbutils().notebook().getContext().userName().get()
raw_path = f"dbfs:/tmp/mosaic/{user_name}"
raw_taxi_zones_path = f"{raw_path}/taxi_zones"
print(f"The raw data will be stored in {raw_path}")
タクシーゾーンGeoJSONをダウンロードします。
%python
import requests
import pathlib
taxi_zones_url = 'https://data.cityofnewyork.us/api/geospatial/d3c5-ddgc?method=export&format=GeoJSON'
# The DBFS file system is mounted under /dbfs/ directory on Databricks cluster nodes
local_taxi_zones_path = pathlib.Path(raw_taxi_zones_path.replace('dbfs:/', '/dbfs/'))
local_taxi_zones_path.mkdir(parents=True, exist_ok=True)
req = requests.get(taxi_zones_url)
with open(local_taxi_zones_path / f'nyc_taxi_zones.geojson', 'wb') as f:
f.write(req.content)
display(dbutils.fs.ls(raw_taxi_zones_path))
テーブルtaxi_zones
を作成します。
Mosaicの関数
st_geomfromgeojson
とSpark関数to_json
を使うことに注意してください
%python
# Note: Here we are using python as a proxy to programmatically
# pass the location of our data source for taxi zones.
spark.sql(f"drop table if exists taxi_zones;")
spark.sql(
f"""
create table if not exists taxi_zones
using json
options (multiline = true)
location "{raw_taxi_zones_path}";
"""
)
テーブルtaxi_zone_explode
を作成します。
Mosaicで使用できる
st_astext
とst_geomfromgeojson
を使います
%sql -- drop table if exists taxi_zone_explode;
create table if not exists taxi_zone_explode as (
select
type,
feature.properties as properties,
st_astext(st_geomfromgeojson(to_json(feature.geometry))) as geometry
from
(
select
type,
explode(features) as feature
from
taxi_zones
)
);
select
*
from
taxi_zone_explode;
テーブルtaxi_zone_explode_h3
を作成します。
他のテーブルと同様H3解像度12を使用します
%sql -- drop table if exists taxi_zone_explode_h3
create table if not exists taxi_zone_explode_h3 as (
select
cast(properties.location_id as int) as locationid,
properties.borough,
properties.zone,
explode(h3_polyfillash3(geometry, 12)) as cell
from
taxi_zone_explode
);
select
*
from
taxi_zone_explode_h3;
空港データのセットアップ
このデモンストレーションではNewark (EWR)とLaGuardia (LGA)空港にフォーカスします。
注意
ポリゴンの定義はTax Zone GeoJSONにあります。簡単にするために取り出しています。
Newark空港のデータフレームewrDF
を作成します。
val ewrDF = Seq[(Int, String, String, Int, String)]((
1, "MULTIPOLYGON (((-74.18445299999996 40.694995999999904, -74.18448899999999 40.69509499999987, -74.18449799999996 40.69518499999987, -74.18438099999997 40.69587799999989, -74.18428199999994 40.6962109999999, -74.18402099999997 40.697074999999884, -74.18391299999996 40.69750699999986, -74.18375099999997 40.69779499999988, -74.18363399999998 40.6983259999999, -74.18356199999994 40.698451999999875, -74.18354399999998 40.69855999999988, -74.18350799999996 40.69870399999992, -74.18327399999998 40.70008999999988, -74.18315699999994 40.701214999999884, -74.18316599999997 40.702384999999886, -74.18313899999998 40.7026279999999, -74.18309399999998 40.7028529999999, -74.18299499999995 40.70315899999985, -74.18284199999994 40.70346499999989, -74.18264399999998 40.70373499999988, -74.18242799999996 40.70395099999992, -74.18220299999996 40.704139999999896, -74.18203199999994 40.70425699999987, -74.18180699999994 40.7043919999999, -74.18157299999996 40.70449999999988, -74.18132099999997 40.70460799999991, -74.18080799999996 40.7047879999999, -74.179467 40.70534599999992, -74.17887299999995 40.70554399999987, -74.17831499999994 40.70572399999987, -74.17776599999996 40.70589499999988, -74.17709099999996 40.706092999999896, -74.17699199999998 40.70613799999988, -74.17689299999995 40.70619199999988, -74.17664999999994 40.70641699999988, -74.17642499999994 40.706695999999916, -74.17628999999994 40.70689399999988, -74.17608299999995 40.70710999999989, -74.17599299999995 40.70719099999991, -74.17589399999997 40.707262999999905, -74.17565999999994 40.70737999999988, -74.17538099999996 40.707469999999915, -74.17515599999996 40.707514999999894, -74.17475999999994 40.707595999999924, -74.17417499999993 40.70766799999991, -74.17388699999998 40.70773099999992, -74.17347299999994 40.707748999999865, -74.17275299999994 40.707802999999906, -74.17188899999996 40.707910999999854, -74.17163699999998 40.70795599999986, -74.17133999999999 40.707964999999895, -74.17120499999999 40.70795599999986, -74.16994499999998 40.707973999999886, -74.16888299999994 40.7079379999999, -74.16681299999993 40.70785699999989, -74.16442799999999 40.70779399999987, -74.16401399999995 40.70777599999992, -74.16233999999997 40.707721999999876, -74.16081899999995 40.70764099999991, -74.16057599999993 40.70760499999988, -74.16033299999998 40.70756899999987, -74.160063 40.7074879999999, -74.15938799999998 40.707262999999905, -74.15904599999999 40.707145999999916, -74.15891999999997 40.70710999999989, -74.15827199999995 40.70687599999993, -74.15459099999998 40.705651999999894, -74.15409599999998 40.70544499999989, -74.15401499999997 40.70538199999988, -74.15387999999996 40.705327999999895, -74.15376299999997 40.705408999999875, -74.15323199999995 40.70524699999987, -74.15317799999997 40.70531899999989, -74.15306999999996 40.7052829999999, -74.15359199999995 40.70437399999987, -74.15386199999995 40.7038429999999, -74.15513999999996 40.70155699999987, -74.15544599999998 40.70108899999988, -74.15575199999995 40.7006659999999, -74.15600399999994 40.70026099999991, -74.15635499999996 40.69975699999986, -74.15745299999998 40.69809199999988, -74.15754299999998 40.6979389999999, -74.15758799999998 40.69781299999988, -74.15762399999994 40.69767799999991, -74.15829899999994 40.696705999999885, -74.15951399999994 40.69488799999988, -74.15958599999993 40.69476199999984, -74.16014399999995 40.69410499999988, -74.16057599999993 40.693222999999875, -74.16262799999998 40.69028899999989, -74.16279899999995 40.69002799999989, -74.16290699999996 40.68987499999987, -74.16292499999997 40.689874999999866, -74.16295199999996 40.689874999999866, -74.16306899999995 40.68989299999988, -74.16309599999994 40.689928999999886, -74.16322199999996 40.68998299999989, -74.16331199999996 40.68999199999993, -74.16341099999994 40.69000099999988, -74.16352799999999 40.69000999999986, -74.16380699999996 40.69004599999989, -74.16410399999995 40.690081999999904, -74.16417599999994 40.690081999999904, -74.16422999999998 40.69005499999988, -74.16436499999998 40.69003699999991, -74.16450899999995 40.68998299999986, -74.16467099999994 40.68988399999989, -74.16479699999996 40.689757999999884, -74.16491399999995 40.689586999999904, -74.16499499999998 40.689388999999885, -74.16528299999999 40.68891199999991, -74.16542699999997 40.6887589999999, -74.16548099999994 40.68863299999987, -74.16560699999997 40.68842599999988, -74.16576899999995 40.68802999999986, -74.16587699999997 40.68787699999991, -74.16583199999997 40.68757999999987, -74.16582299999999 40.68748999999987, -74.16580499999998 40.687156999999914, -74.16582299999999 40.68703999999986, -74.16589499999998 40.6868419999999, -74.16604799999999 40.68655399999988, -74.16639899999996 40.686022999999864, -74.16650699999997 40.68588799999986, -74.16674099999994 40.685491999999925, -74.16695699999997 40.68523099999988, -74.16738899999996 40.684546999999924, -74.16781199999997 40.6839439999999, -74.16791099999995 40.68379099999988, -74.16804599999995 40.68360199999991, -74.16816299999994 40.683475999999885, -74.16822599999995 40.68334999999991, -74.16848699999997 40.68299899999991, -74.16886499999998 40.68239599999987, -74.16916199999997 40.68199999999991, -74.16929699999997 40.68178399999989, -74.16947699999997 40.68155899999991, -74.16981899999996 40.681018999999885, -74.16995399999996 40.680874999999915, -74.17005299999994 40.68066799999987, -74.17041299999994 40.6801549999999, -74.17051199999997 40.67999299999987, -74.17067399999996 40.679650999999886, -74.17093499999999 40.679290999999864, -74.17144799999994 40.67847199999989, -74.17151999999999 40.678381999999885, -74.17160999999999 40.678255999999884, -74.17193399999996 40.67782399999988, -74.17200599999995 40.67773399999988, -74.17283399999997 40.67656399999988, -74.17314899999997 40.67619499999991, -74.17322999999999 40.6760779999999, -74.17329299999994 40.67601499999989, -74.17358999999993 40.67571799999991, -74.17423799999995 40.67493499999991, -74.17437299999995 40.674817999999895, -74.17484999999994 40.67432299999992, -74.17500299999995 40.6741699999999, -74.17538999999995 40.67375599999987, -74.17604699999998 40.673044999999895, -74.17630799999995 40.67276599999986, -74.17641599999996 40.672621999999876, -74.17663199999998 40.67239699999989, -74.17678499999994 40.67218099999991, -74.17697399999997 40.6719379999999, -74.17709099999996 40.671784999999886, -74.17734299999995 40.67155999999988, -74.17754999999994 40.67142499999989, -74.17778399999997 40.671316999999874, -74.17802699999999 40.671208999999884, -74.17862999999994 40.671037999999896, -74.17888199999999 40.671001999999895, -74.17912499999994 40.67099299999991, -74.17933199999999 40.67101099999992, -74.17979099999997 40.67115499999989, -74.17997999999994 40.671208999999884, -74.18010599999997 40.671262999999904, -74.18030399999998 40.67129899999986, -74.18133899999998 40.67170399999986, -74.18213999999996 40.67202799999989, -74.18384999999995 40.672648999999886, -74.18437199999994 40.67290999999989, -74.18458799999996 40.67302699999988, -74.18492099999997 40.673269999999896, -74.18503799999996 40.67335999999989, -74.18513699999994 40.673458999999866, -74.18547899999999 40.67390899999987, -74.18594699999994 40.674664999999905, -74.18670299999997 40.67578999999992, -74.18733299999997 40.67674399999987, -74.18767499999996 40.67729299999991, -74.18795399999995 40.67761699999989, -74.18819699999995 40.67792299999992, -74.18852099999998 40.67848099999987, -74.18877299999997 40.67885899999989, -74.18905199999995 40.67933599999985, -74.18935799999997 40.67975899999988, -74.18949299999997 40.680091999999895, -74.18969999999996 40.680793999999885, -74.18977199999995 40.68113599999987, -74.189781 40.681198999999886, -74.18983499999996 40.68131599999987, -74.18991599999998 40.68154099999988, -74.18996999999996 40.6818019999999, -74.18999699999995 40.6822519999999, -74.18999699999995 40.68262999999992, -74.18996999999996 40.68295399999989, -74.18998799999997 40.68317899999989, -74.18995199999995 40.683520999999885, -74.18993399999994 40.68370999999992, -74.189871 40.684078999999876, -74.189781 40.68481699999991, -74.18976299999997 40.68503299999986, -74.18962799999997 40.686103999999915, -74.18955599999998 40.68689599999987, -74.18951999999996 40.6872019999999, -74.18947499999996 40.68748999999985, -74.18939399999994 40.68773299999988, -74.18939399999994 40.68783199999991, -74.18941199999995 40.687939999999855, -74.18940299999997 40.68809299999987, -74.18934899999994 40.68826399999989, -74.18922299999997 40.68862399999989, -74.18898899999994 40.68904699999991, -74.18870099999998 40.689442999999876, -74.18779199999994 40.690189999999866, -74.18723399999999 40.69059499999986, -74.18636999999995 40.69118899999991, -74.18591099999998 40.69144999999988, -74.18563199999994 40.69164799999987, -74.18445299999996 40.694995999999904)))", "Newark Airport", 1, "EWR")
).toDF("OBJECTID", "the_geom", "zone", "LocationID", "borough")
LaGuardia空港のデータフレームlgaDF
を作成します。
val lgaDF = Seq[(Int, String, String, Int, String)]((
132, "MULTIPOLYGON (((-73.8712485624251 40.78603815125499, -73.86978081192638 40.78533793332583, -73.86923491782913 40.78596991495815, -73.86918113468879 40.785943489784934, -73.86969325713956 40.78529616325478, -73.86987007434347 40.78508590706814, -73.86943751657091 40.78363839981371, -73.86871414175994 40.783718086052126, -73.86871058864548 40.783733500520235, -73.86870358701877 40.78374821705438, -73.86869334724 40.78376179349807, -73.868680176961 40.78377382194827, -73.86866447188046 40.783783941011144, -73.86864670385509 40.78379184665982, -73.86862740672426 40.783797301369724, -73.8686071602691 40.78380014125412, -73.86858657279326 40.78380028098891, -73.86856626284758 40.78379771637564, -73.86854684064295 40.783792524468076, -73.86852888971897 40.783784861256905, -73.86851294940965 40.78377495698295, -73.86849949863941 40.783763109219564, -73.86848920723403 40.783750088384004, -73.86848193339068 40.78373594038316, -73.86847788300818 40.783721065704746, -73.86847717073857 40.78370588540606, -73.86847981674143 40.78369082919536, -73.8684857461148 40.78367632326808, -73.86849479101389 40.78366277824263, -73.8685066954039 40.78365057753674, -73.86852112230667 40.78364006651474, -73.86853766334073 40.78363154271083, -73.86855585028033 40.783625247407215, -73.86857516830997 40.78362135880435, -73.86859507059657 40.783619986976205, -73.86861499376909 40.783621170754984, -73.86863437386539 40.783624876631684, -73.86865135955154 40.78363046770972, -73.86866699619074 40.783637996786204, -73.86868090404359 40.783647281015995, -73.8686927453537 40.783658094929415, -73.86870223255126 40.78367017590812, -73.86870913523617 40.78368323056261, -73.86871328577288 40.783696941857244, -73.86942344795426 40.78359131491769, -73.87048029898958 40.78230611231524, -73.87146292358177 40.78212640074081, -73.87255732673727 40.78082112154257, -73.87238637036785 40.780740527476986, -73.87218014326744 40.78052591567086, -73.87118588705187 40.7800015988523, -73.87019485020025 40.779546803981, -73.87015751178103 40.779497945638965, -73.86825520270827 40.77851264407573, -73.86792643466 40.778394247798424, -73.86632902853108 40.77764199906134, -73.86615198692074 40.777590680987565, -73.86576064949364 40.777358120694394, -73.86513221186647 40.776966470014244, -73.86471066497087 40.77673318625358, -73.86215889693487 40.775514553767344, -73.85921146961682 40.77408846731156, -73.85911792602863 40.77413042538886, -73.85901031879031 40.77413418740526, -73.85892842217254 40.77409513659952, -73.85890298388608 40.77400941135215, -73.85893795943063 40.77395612487197, -73.85863356071194 40.7738088345996, -73.85840105714999 40.77372133605423, -73.8579615794531 40.773489205888644, -73.85773563893147 40.77338481475659, -73.85723387590262 40.77318362981195, -73.85690044014613 40.77305877286938, -73.85674398259154 40.772979072572504, -73.85656384625518 40.7728996687605, -73.8559380688681 40.772636741565236, -73.8555107354459 40.77243896817119, -73.85505106271819 40.7721953899891, -73.85504825390281 40.772114350714084, -73.85504803073704 40.77203707882778, -73.85505451774685 40.771924657916436, -73.85506798595154 40.77187023014758, -73.85508635910341 40.77183138827739, -73.85511549005498 40.77181323378766, -73.85518065336801 40.7717835813866, -73.8556587255464 40.77115074024036, -73.85642119168853 40.77034671356568, -73.85643866852624 40.77033323611143, -73.85645807349368 40.77032086119883, -73.85650122019709 40.770300293686326, -73.85652419256841 40.77029246081006, -73.85654757393833 40.77028645581004, -73.85657099964907 40.7702823695339, -73.85661890723391 40.77027930513913, -73.8566688368006 40.77028247319014, -73.85669390392543 40.77028636785116, -73.85671858798878 40.77029173877042, -73.85674255927898 40.77029851350239, -73.85676549671736 40.77030660168678, -73.8572320319643 40.77052901036683, -73.85728951464255 40.770550973015304, -73.85731372099039 40.77055836056099, -73.85733677055224 40.77056407124349, -73.85735910137694 40.770568213827595, -73.85738091328331 40.77057080715644, -73.8573967280712 40.77056998960427, -73.85740515395037 40.770567937621934, -73.85743064536503 40.77055579456295, -73.85744762320293 40.770542920269385, -73.85745955790304 40.77052876117753, -73.8574640529471 40.770520193771915, -73.85746766311485 40.77051046314486, -73.85747194415998 40.77049130633454, -73.85747816799682 40.77044489329805, -73.85748432439799 40.7704238102495, -73.85748920023862 40.770414505027546, -73.85749528182234 40.77040655222482, -73.85750267630988 40.77039981644549, -73.85750688715996 40.7703968777218, -73.85752974542834 40.770386307969765, -73.85755023687742 40.77038078501798, -73.8575730336926 40.77037761613721, -73.8575984180025 40.770376766494465, -73.85762452189127 40.770378022362266, -73.85765412555506 40.77038128230353, -73.85776107882876 40.77039997440542, -73.8577762649952 40.770405098333605, -73.85783557421978 40.7704316193319, -73.85784854459581 40.770436606663424, -73.8578600035371 40.770440060440414, -73.85787912685437 40.770442876015444, -73.85789051085527 40.77044210670596, -73.85790102304883 40.77043921025004, -73.85790399937207 40.770437863495005, -73.85810814036756 40.770333287296765, -73.85812062818592 40.77033733686403, -73.85813724838887 40.77034723565637, -73.85814676560885 40.77035036435933, -73.85815492427808 40.77035094147852, -73.85816356056772 40.770350636994046, -73.85817215042374 40.77034947605985, -73.85818022082987 40.77034752359621, -73.8581872858477 40.77034491101544, -73.85819573008204 40.77033990568516, -73.8582051105021 40.77032345647718, -73.85820236123325 40.77030737932604, -73.85818924879476 40.770290460502615, -73.85798709647926 40.770173613711364, -73.8577362357601 40.77000314507246, -73.85745009449602 40.76980870552686, -73.85692407357327 40.76951252654719, -73.85669054762312 40.769385508210135, -73.85621267610162 40.76919579775784, -73.85614888582383 40.76907879647392, -73.85614973578456 40.76898684694477, -73.85624565746173 40.76888571419896, -73.85631296048622 40.76887844058347, -73.85647771882165 40.76890515740585, -73.8566356595733 40.76894032975357, -73.85681833769972 40.76903884617975, -73.85704790262234 40.769139555948875, -73.85710288000817 40.76909250012824, -73.85716158430333 40.768991401708725, -73.85724041942667 40.76888157464139, -73.85731254450552 40.76876360817364, -73.8576780026624 40.76827626539154, -73.85797604163109 40.76811483047016, -73.85805261946524 40.76809065637136, -73.85819704236047 40.76794331410307, -73.85832813918591 40.76782418555582, -73.85843507175626 40.767705440958686, -73.85859539017551 40.76748501377069, -73.85877376553057 40.76734999560101, -73.85892075977725 40.76727356056681, -73.85937510429778 40.767075328942674, -73.85963779907536 40.76702796771879, -73.85983028214702 40.76703180469933, -73.85998650669511 40.766957936481205, -73.86016778943647 40.766942227274065, -73.86033645988697 40.76695943339514, -73.86055456613312 40.76700031925954, -73.86069743030745 40.767011334357925, -73.86077468876901 40.76699451566885, -73.8608553594459 40.76693211808921, -73.86092884231329 40.766941950620456, -73.86098883676625 40.76701427074403, -73.86100762175549 40.76705938132751, -73.86100038221741 40.76717391624676, -73.86103345678848 40.76725679418058, -73.8611027715228 40.767271942529824, -73.86118998431375 40.76720716686883, -73.86121194395173 40.7670310899356, -73.86131386166113 40.766940631743715, -73.86143250938784 40.76691478503675, -73.86169177207913 40.76691704203793, -73.86181545316371 40.76693736156741, -73.86202142385251 40.766913133555526, -73.86205271633084 40.76691261216039, -73.86212976522565 40.76691133536475, -73.86222539270146 40.766894601510174, -73.86220059780494 40.7668398937497, -73.86226565137707 40.76680316777269, -73.86243485599871 40.76688891752954, -73.8626022140311 40.7668939798399, -73.8626886417663 40.76687591050486, -73.86275490196954 40.76676452492579, -73.8627202147973 40.7666706422922, -73.86224047985023 40.766283086505766, -73.86215785661702 40.76619175792949, -73.86216751063289 40.76604799421443, -73.86201493096908 40.76589092640018, -73.86176874486004 40.765594004521994, -73.86153844239246 40.76534253221643, -73.86123668863942 40.764960616103124, -73.86043337809801 40.76386888243775, -73.86057039699999 40.76383121299988, -73.86093691499987 40.76372861499989, -73.8610972439999 40.763664476999935, -73.86180888799989 40.764544706999835, -73.86286118599996 40.76584623599989, -73.86321805300003 40.76625081599989, -73.86340214900001 40.76645551199989, -73.86353063400003 40.76657622399993, -73.86404975699985 40.76706394699995, -73.86458486999996 40.76756668999991, -73.86509294099989 40.76801531499993, -73.8655058869999 40.76834576799996, -73.86606875999985 40.768796181999946, -73.86728193799998 40.769650002999896, -73.868098735 40.77017200899994, -73.8688938269999 40.77064902199992, -73.869437198 40.77092448199991, -73.86962970499988 40.77100636599993, -73.87016128299994 40.77120898499992, -73.87071447499991 40.771390689999926, -73.87143357299982 40.77157646699987, -73.87173533999989 40.771622584999896, -73.87223577699984 40.77169565899994, -73.87272561699997 40.7717366429999, -73.8732794149999 40.77176740099993, -73.87380524599989 40.77176797799989, -73.87426330499994 40.7717533869999, -73.87475943399997 40.771719986999926, -73.87517573299998 40.77167671099994, -73.87601656500001 40.77155104399987, -73.87608949699984 40.77153690299987, -73.87615314099988 40.77152345499992, -73.87665135299991 40.771418945999926, -73.87704537199986 40.77131652899987, -73.87743339799992 40.771200254999926, -73.87781416299993 40.77107045599997, -73.87818646999993 40.77092753499992, -73.87854915699987 40.77077199599996, -73.87897365099998 40.77055642199985, -73.87961158799999 40.77023244799993, -73.88020161599997 40.769904189999856, -73.88056341499993 40.76970115299986, -73.88068583599984 40.76963245199993, -73.88159926399999 40.7691258519999, -73.88277241799986 40.76852944099994, -73.8835659359999 40.7681804449999, -73.88426804199986 40.767894764999916, -73.88439729399991 40.767842169999874, -73.88506266099994 40.767598942999946, -73.88564823400002 40.767407690999946, -73.88602554199989 40.76728295799989, -73.88667620399991 40.76710325899991, -73.88671945999997 40.767091312999874, -73.88713429399999 40.76698296599988, -73.88762452899991 40.76686519999991, -73.88770661799985 40.766838597999936, -73.88772988199987 40.76691443399996, -73.88781814599999 40.767202077999876, -73.88817759199992 40.7684282569999, -73.88819008199995 40.768509872999935, -73.88822109399986 40.768592289999944, -73.8882713109998 40.76867187099987, -73.88833975599985 40.7687448059999, -73.88842375699991 40.76880764399987, -73.8885192579999 40.76885784099994, -73.88862144999993 40.7688940489999, -73.8887255129998 40.76891623899995, -73.88873647199979 40.76898593099992, -73.889040003 40.77091623799992, -73.88942374399998 40.77304592899988, -73.88945486901865 40.77353295106357, -73.88779843805143 40.773711297424654, -73.88767940155543 40.77381879172325, -73.88580762286652 40.774022224475374, -73.88581012437984 40.77403601766625, -73.88581792367333 40.774078998740464, -73.88582793733663 40.77407794805797, -73.88583361561382 40.77410924043991, -73.88579151867035 40.774113655072405, -73.88579034397598 40.774107187250394, -73.88568182351487 40.77411856804315, -73.88568292636904 40.77412464546015, -73.88566947657924 40.774126056083816, -73.88569498085096 40.774266610445544, -73.88570207086045 40.77426586763382, -73.88571221642924 40.77432177112161, -73.88567398900969 40.774325779948875, -73.88567349258264 40.77432305063104, -73.88565054964837 40.774325457489574, -73.88567007665036 40.774433071216826, -73.88562650739728 40.77443764029466, -73.88560236103582 40.774304563939076, -73.88564627381488 40.774299959528406, -73.88564767572932 40.7743076912231, -73.88566773133905 40.77430558840732, -73.88566130980054 40.77427020514186, -73.88566791315597 40.77426951277721, -73.88564276584745 40.77413093288607, -73.8856385336236 40.77413137765436, -73.8856323103537 40.77409707478859, -73.8856758683633 40.77409250687015, -73.88567708364381 40.77409920206598, -73.88578804109311 40.774087565703645, -73.88578683431469 40.77408091775973, -73.88579477727255 40.774080084653825, -73.88578718974873 40.77403826926387, -73.88578473697669 40.77402471216453, -73.88431556689014 40.77418436398409, -73.88408262463024 40.7741044801433, -73.88391627963797 40.77413260659154, -73.8839316828436 40.77418464435257, -73.88410836235168 40.77416708339821, -73.88455150532151 40.77436062233214, -73.8846598266042 40.77448010121009, -73.88479222517094 40.774807063622696, -73.88483742391051 40.77495298074568, -73.88485536080363 40.77510221207482, -73.88483519529827 40.77522084211525, -73.8847736190733 40.77533228058653, -73.8846752373344 40.775396766475154, -73.88460175355154 40.77542051069447, -73.88454882637298 40.77542804960337, -73.88445519420895 40.775464803850525, -73.88441988513166 40.77545646620444, -73.88428737712128 40.77547205616595, -73.8842883055274 40.7754830970367, -73.88433217018762 40.77548297087071, -73.88436257156576 40.77549337373265, -73.88440957187971 40.775549360877285, -73.88475744404441 40.77725413868524, -73.88482322075201 40.77751239820959, -73.88493197897337 40.777905022104456, -73.88499070411882 40.77814521789072, -73.88503248656444 40.77829983172282, -73.88505522658441 40.77840531274414, -73.88511399222635 40.77847552903457, -73.8852000839678 40.778513571263524, -73.88523914391557 40.778559221175875, -73.88525026986592 40.778612901742385, -73.88521875942243 40.77866136280502, -73.88508345876635 40.77867131316193, -73.88505291740942 40.77877543567209, -73.88512416435393 40.779061209283, -73.88508410716267 40.77935257161612, -73.88497842475354 40.77974438290775, -73.88493466381183 40.77993691280691, -73.88488113731053 40.779987196741224, -73.8847822502109 40.780038480497915, -73.88445032609359 40.78015460132617, -73.88385203219555 40.780264319754494, -73.88325271441282 40.78026542290212, -73.88324038254821 40.78032572327379, -73.88290491186557 40.780348697239766, -73.88287851751367 40.78023632010934, -73.88281116030053 40.78021893669761, -73.88274449717143 40.780223424072155, -73.8826403051081 40.7802858579589, -73.88231777091364 40.780360549298436, -73.88215899724347 40.780367867563115, -73.88211863000274 40.78039586317721, -73.88208293364791 40.780450836167304, -73.88202629453421 40.78048504352782, -73.88187200517194 40.780498454869836, -73.88168407683699 40.780500062638886, -73.88155872181727 40.78054662573665, -73.88139075985553 40.78060160840806, -73.8812191473699 40.78061061723774, -73.88096087912395 40.78058557723644, -73.88087003883854 40.78057306017537, -73.88075025848272 40.78057432624618, -73.88038817410674 40.780619416402864, -73.88018929439086 40.78063496792158, -73.8800224049746 40.78064471983977, -73.87990587782392 40.780639145170944, -73.87973466843 40.78065481804985, -73.87961343937161 40.780694319457815, -73.87958646834618 40.780698110860044, -73.87959071037689 40.78077914871647, -73.87960408126767 40.780779162686564, -73.87960692432854 40.78085031306525, -73.8794746864389 40.78085695107505, -73.87947341303463 40.780702688914175, -73.87944805023491 40.7807004765119, -73.8793580603621 40.78065668012734, -73.87930409174346 40.7807594078809, -73.87920483851617 40.78076738933479, -73.87913370828485 40.78067822938222, -73.87892488684261 40.78067355215143, -73.87870562518874 40.78058590995002, -73.87905115692989 40.781730533363366, -73.87930868295156 40.781853658332345, -73.878858374856 40.78239817868118, -73.8788883123546 40.78241442661198, -73.87888384757818 40.7824189397462, -73.87914510421407 40.782544567795654, -73.87916445729643 40.78252313142975, -73.87918078802261 40.782529923616465, -73.87916440606021 40.78255136427313, -73.87974777853563 40.782836564207, -73.8797016367226 40.782882818072366, -73.87967788425708 40.782872629506564, -73.87970469079072 40.78283764835939, -73.87945679555362 40.78271542259297, -73.87942106533049 40.78275604079166, -73.87940622567464 40.782745861494845, -73.87943006869374 40.78270635948339, -73.87914654749781 40.78256828605519, -73.87908251177446 40.782650658660685, -73.8790602451593 40.78264047102351, -73.87912427592009 40.782560356165554, -73.87886747171292 40.782436991160054, -73.87885705699973 40.782444885571216, -73.87883146781734 40.78243071449443, -73.87833923298054 40.78302592074941, -73.8751575489337 40.781509449653285, -73.87484527331479 40.78168492407536, -73.8712485624251 40.78603815125499)), ((-73.8728719590193 40.78597502790236, -73.87282839783694 40.78595446173833, -73.8728226286446 40.78596397466698, -73.8727740594054 40.78594192145882, -73.87280920016116 40.78590310909999, -73.8728535000943 40.78592565061054, -73.8728404791595 40.78594312287908, -73.8728849699182 40.785962633107935, -73.8728719590193 40.78597502790236)), ((-73.89008921683674 40.773626435762374, -73.89034195999996 40.77323760499998, -73.89183244902246 40.77488019994431, -73.8917254778065 40.77505203213404, -73.89171751335807 40.77516363289303, -73.89168935121086 40.775254737843575, -73.89152759083345 40.775478135367386, -73.89144361875265 40.77562269470968, -73.89135801404697 40.77604940793021, -73.8913676627334 40.77634299857849, -73.89145185712488 40.776512769270866, -73.89126414584786 40.77685585537462, -73.89110422665516 40.77702320697468, -73.89112044339463 40.77712469973607, -73.89106681416045 40.77721292530485, -73.89094860635713 40.77728384517263, -73.89073132114231 40.77733204986777, -73.89056250595364 40.77740392096429, -73.89049834642208 40.77746263585198, -73.89045915776707 40.77755049725344, -73.89046537395133 40.777703219287076, -73.89035193082609 40.77780032027483, -73.89017369508576 40.77785081277785, -73.89008729416794 40.77778419360695, -73.89002369095195 40.77724712640633, -73.88951644380981 40.775936714445564, -73.88947151143196 40.775738056523394, -73.8894710539975 40.77553555096718, -73.88955512480328 40.77527870961293, -73.88956149685701 40.77509223860254, -73.88958184665768 40.77503504961379, -73.88958935733879 40.77495871069513, -73.88961448773941 40.77484568714223, -73.88978900756075 40.774236861169776, -73.88984246558316 40.77410037855676, -73.88988671629585 40.77401734907423, -73.8900676015257 40.77396248425602, -73.89019171520681 40.773646876224944, -73.89008921683674 40.773626435762374)))", "LaGuardia Airport", 132, "Queens")
).toDF("OBJECTID", "the_geom", "zone", "LocationID", "borough")
このサンプルで使用する二つの空港Newark (EWR) and LaGuardia (LAG)のデータフレームnycAirportsDF
を作成します。
val nycAirportsDF = ewrDF.unionAll(lgaDF).toDF("OBJECTID", "the_geom", "zone", "LocationID", "borough")
以降の分析で再計算しなくて済む様に、テーブルairports
を作成します。
nycAirportsDF
.write
.mode("overwrite")
.saveAsTable("airports")
解像度12のH3セルで空港をポリゴンで埋めたH3セルを格納するテーブルを作成します。
explode(h3_polyfillash3(the_geom, 12)) as cell
関数h3_polyfillash3
は、入力ポリゴンを完全にカバー しない 場合があるセルを生成することに注意してください。H3のライブラリがポリゴンで埋める動作においては、H3セルの中央がポリゴンの内面にある場合には、セルがポリゴンあるいはマルチポリゴンを(部分的に)カバーするとみなすというものです。これは、Mosaicのmosaicfill
やmosaic_explode
関数の動作とは異なります。
%sql -- drop table if exists airports_h3;
create table if not exists airports_h3 as (
select
locationid,
explode(h3_polyfillash3(the_geom, 12)) as cell
from
airports
)
それぞれの空港で使用するセルの数をカウントします。
%sql
select
locationid,
count(*)
from
airports_h3
group by
locationid
order by
locationid
2つの空港間の移動の数、移動した乗客の数、発生した料金(チップを除く)を見つけ出します。
15K の移動, 27K の乗客, 料金は $1.38M
%sql
select
count(*) as num_trips,
sum(passenger_count) as num_passengers,
('$' || cast(sum(fare_amount) as string)) as earnings
from
trips_h3 t,
airports_h3 src,
airports_h3 trg
where
t.pickup_cell = src.cell
and t.dropoff_cell = trg.cell
and src.locationid <> trg.locationid
airports_h3_c
を作成するためにairport
をコンパクトにします
explode(h3_compact(h3_polyfillash3(the_geom, 12))) as cell
%sql -- drop table if exists airports_h3_c;
create table if not exists airports_h3_c as (
select
locationid,
explode(h3_compact(h3_polyfillash3(the_geom, 12))) as cell
from
airports
)
それぞれの空港で使用するセルの数をカウントします。
%sql
select
locationid,
count(*)
from
airports_h3_c
group by
locationid
order by
locationid
最後に作成したテーブルを確認します。
%sql show tables
データ分析
バインディングをインポートします。
# Import Databricks functions to get H3
from pyspark.databricks.sql.functions import *
%sql -- DATABASE FROM NB-01
use h3_prod_api_demo;
show tables;
空港の参照
サニティチェックとして2つの空港(LaGuardiaとNewark)のジオメトリクスをレンダリングします。
as_kepler("airports", "the_geom", "geometry")
airports_h3
をレンダリングします。
このテーブルには3万3千のH3インデックスが存在しており、結果を参照するためにリミットを若干高めに設定します。
as_kepler("airports_h3", "cell", "h3", limit_cnt=100000)
airports_h3_c
をレンダリングします。
異なる解像度が使われているので、このコンパクト化されたテーブルには2K程度のH3インデックスのみが含まれています。
注意
ミックスされた解像度でのレンダリングの説明に役立つ h3 indexing compaction も参照ください。
as_kepler("airports_h3_c", "cell", "h3", limit_cnt=5000)
trips_h3と
trips_h3_15` にはそれぞれ16億行が含まれています。
レンダリングする際には何かしらの方法でフィルタリングしたいと考えており、スタート地点として2つの空港の移動のみを表示します!
空港の最小の解像度を見つけます。
min(h3_resolution(cell))
= 8 の結果をh3_toparent
呼び出しで使用します。
%sql
select
min(h3_resolution(cell))
from
airports_h3_c
代わりにコンパクト化されたセルを使用し、フィルターとして親IDとh3_ischildof
エクスプレッションを使用します。
h3_toparent(t.pickup_cell, 8) = h3_toparent(src.cell, 8)
and h3_toparent(t.dropoff_cell, 8) = h3_toparent(trg.cell, 8)
and h3_ischildof(t.pickup_cell, src.cell)
and h3_ischildof(t.dropoff_cell, trg.cell)
%sql create
or replace view airport_trips_h3_c as (
select
t.passenger_count,
t.fare_amount,
src.locationid as src_locationid,
trg.locationid as trg_locationid,
src.cell as src_cell,
trg.cell as trg_cell
from
trips_h3_15 t,
airports_h3_c src,
airports_h3_c trg
where
h3_toparent(t.pickup_cell, 8) = h3_toparent(src.cell, 8)
and h3_toparent(t.dropoff_cell, 8) = h3_toparent(trg.cell, 8)
and h3_ischildof(t.pickup_cell, src.cell)
and h3_ischildof(t.dropoff_cell, trg.cell)
and src.locationid <> trg.locationid
);
-- HOW MANY TRIPS?
select
format_number(count(*), 0) as count
from
airport_trips_h3_c
移動データに関する一連の質問に答えるために、H3のセルIDを用いてairportとtripsテーブルを容易に組み合わせることができます。例えば、料金による収益はどれだけだったのでしょうか?
%sql
select
format_number(count(*),0) as num_trips,
format_number(sum(passenger_count),0) as num_passengers,
(
'$' || cast(format_number(sum(fare_amount), 2) as string)
) as earnings
from
airport_trips_h3_c
%sql create
or replace view src_airport_trips_h3_c as (
select
src_cell as cell,
src_locationid as locationid,
format_number(count(*), 0) as trip_cnt_disp,
format_number(sum(passenger_count), 0) as passenger_sum_disp,
(
'$' || cast(format_number(sum(fare_amount), 2) as string)
) as fare_sum_disp,
count(*) as trip_cnt,
sum(passenger_count) as passenger_sum,
sum(fare_amount) as fare_sum
from
airport_trips_h3_c
group by
cell,
src_locationid
);
LaGuardiaとNewarkのピックアップの割合は?
%sql create
or replace temp view src_full_airport_trips_h3_c as (
select
cell,
zone,
borough,
trip_cnt_disp,
passenger_sum_disp,
fare_sum_disp,
airports.locationid,
trip_cnt,
passenger_sum,
fare_sum
from
src_airport_trips_h3_c
inner join airports on src_airport_trips_h3_c.locationid = airports.locationid
);
%sql select locationid, sum(trip_cnt) as num_trips, zone, borough from src_full_airport_trips_h3_c group by locationid, zone, borough
コンパクト化された乗車位置の活用
空港における乗車です (大部分はNewarkではなくLaGuardiaからの乗車であることがわかります)
注意
このレンダリングはカスタマイズすることもできます。例えば、以下の様に、セルごとのtrip_cnt
に色を関連づけ、高さにpassenger_sum
を関連づけた3D表示が可能です。
as_kepler("src_airport_trips_h3_c", "cell", "h3")
お客様がLaGuardiaで乗車した際、他に降車した場所はどこでしょうか?
%sql create
or replace temp view lga_dropoffs as (
select
pickup_cell,
dropoff_cell,
passenger_count,
fare_amount,
air.locationid as pickup_locationid
from
trips_h3
INNER JOIN (
select
*
from
airports_h3
where
locationid = 132
) air ON trips_h3.pickup_cell = air.cell
);
-- HOW MANY TOTAL DROP-OFFS from LGA?
select
format_number(count(*), 0) as count
from
lga_dropoffs;
降車したH3セルの位置ごとにカウントを集計しましょう。
%sql create
or replace temp view lga_agg_dropoffs as (
select
count(*) as dropoff_cnt,
dropoff_cell
from
lga_dropoffs
group by
dropoff_cell
);
select
format_number(count(*), 0) as count
from
lga_agg_dropoffs;
(LGAからの)降車ゾーンに対するビューを作成
以前作成したNYCタクシーゾーンテーブル(geojsonから作り出した)
taxi_zone_explode_h3
を使用します
%sql create
or replace temp view lga_agg_zone_dropoffs_h3 as (
select
dropoff_cnt,
dropoff_cell,
locationid,
borough,
zone
from
lga_agg_dropoffs
INNER JOIN taxi_zone_explode_h3 ON lga_agg_dropoffs.dropoff_cell = taxi_zone_explode_h3.cell
);
%sql select *
from
lga_agg_zone_dropoffs_h3;
%sql create
or replace temp view lga_agg_zone_dropoffs as (
select
sum(dropoff_cnt) as dropoff_sum,
locationid,
borough,
zone
from
lga_agg_zone_dropoffs_h3
group by
zone, borough, locationid
);
%sql select * from lga_agg_zone_dropoffs
%sql create
or replace temp view lga_agg_zone_dropoffs_geom as (
select
dropoff_sum,
locationid,
borough,
zone,
taxi_zone_explode.geometry
from lga_agg_zone_dropoffs
inner join taxi_zone_explode on lga_agg_zone_dropoffs.zone = taxi_zone_explode.properties.zone);
%sql select * from lga_agg_zone_dropoffs_geom
タクシー降車ゾーンをレンダリングします。
注意 このレンダリングもカスタマイズすることができます。例えば、以下に示す様にタクシーゾーンごとのdropoff_sum
に色を関連づけることができます。
as_kepler("lga_agg_zone_dropoffs_geom", "geometry", "geometry")
このように、SQLやScala/Pythonを介してH3を活用することで、Databricks上で様々な地理空間情報の解析が行える様になります。