【snowpark/python】snowparkにおけるJSONファイルに対するLATERAL JOINについて

Posted at 2025-06-11

JSONファイルをsnowpark上から取り出して、テーブル形式にする方法についてまとめました。
join_table_functionによるLATERAL JOIN(元テーブルに対して一行ずつ適用しながらカラムを追加する)
方法やFlattenによるJSONの中身を行に展開する流れをめもしておきます

python.py

# 動作確認
# JSONデータ
json_data = [
    {"customer": [{"name": "Joyce Ridgely", "phone": "16504378889"}]},
    {"customer": [{"name": "Bradley Green", "phone": "12127593751"}]}
]
values = [(json.dumps(row),) for row in json_data]
df = session.create_dataframe(values, schema=["src"])
df = df.with_column("src", parse_json(col("src")))
df.show()

----------------------------------
|"SRC"                           |
----------------------------------
|{                               |
|  "customer": [                 |
|    {                           |
|      "name": "Joyce Ridgely",  |
|      "phone": "16504378889"    |
|    }                           |
|  ]                             |
|}                               |
|{                               |
|  "customer": [                 |
|    {                           |
|      "name": "Bradley Green",  |
|      "phone": "12127593751"    |
|    }                           |
|  ]                             |
|}                               |
----------------------------------

下記のようにjoin_table_functionを使用してflattenを使用すると、selectすると
JSON形式のファイルを取得することができる。

python.py

flattened_df2 = df.join_table_function("flatten", col("src")["customer"]).select(col("value")["name"].alias("name"), col("value")["phone"].alias("phone"))
flattened_df2.show()

-----------------------------------
|"NAME"           |"PHONE"        |
-----------------------------------
|"Joyce Ridgely"  |"16504378889"  |
|"Bradley Green"  |"12127593751"  |
-----------------------------------

順を追ってみていきたいと思います。下記のようにjoin_table_functionを使ってLATERAL JOINを実施するとsrc列のcustomerのキーの部分を展開して取り出すことができる。

python.py

flattened_df2 = df.join_table_function("flatten", col("src")["customer"]) 
flattened_df2.show()

---------------------------------------------------------------------------------------------------------------------------------
|"SRC"                           |"SEQ"  |"KEY"  |"PATH"  |"INDEX"  |"VALUE"                     |"THIS"                        |
---------------------------------------------------------------------------------------------------------------------------------
|{                               |1      |NULL   |[0]     |0        |{                           |[                             |
|  "customer": [                 |       |       |        |         |  "name": "Joyce Ridgely",  |  {                           |
|    {                           |       |       |        |         |  "phone": "16504378889"    |    "name": "Joyce Ridgely",  |
|      "name": "Joyce Ridgely",  |       |       |        |         |}                           |    "phone": "16504378889"    |
|      "phone": "16504378889"    |       |       |        |         |                            |  }                           |
|    }                           |       |       |        |         |                            |]                             |
|  ]                             |       |       |        |         |                            |                              |
|}                               |       |       |        |         |                            |                              |
|{                               |2      |NULL   |[0]     |0        |{                           |[                             |
|  "customer": [                 |       |       |        |         |  "name": "Bradley Green",  |  {                           |
|    {                           |       |       |        |         |  "phone": "12127593751"    |    "name": "Bradley Green",  |
|      "name": "Bradley Green",  |       |       |        |         |}                           |    "phone": "12127593751"    |
|      "phone": "12127593751"    |       |       |        |         |                            |  }                           |
|    }                           |       |       |        |         |                            |]                             |
|  ]                             |       |       |        |         |                            |                              |
|}                               |       |       |        |         |                            |                              |
---------------------------------------------------------------------------------------------------------------------------------

ここでselectメソッドを使用してみると、一般的なselectメソッドのように"VALUE"列を取り出すことができる。

python.py

flattened_df2 = df.join_table_function("flatten", col("src")["customer"]).select(col("value"))
flattened_df2.show()

------------------------------
|"VALUE"                     |
------------------------------
|{                           |
|  "name": "Joyce Ridgely",  |
|  "phone": "16504378889"    |
|}                           |
|{                           |
|  "name": "Bradley Green",  |
|  "phone": "12127593751"    |
|}                           |
------------------------------

ここで、nameのキーのみを取り出したいなと思う時は、col("value")["name"]の形式で"name"のキーを指定することができる。

flattened_df2 = df.join_table_function("flatten", col("src")["customer"]).select(col("value")["name"].alias("name"))
flattened_df2.show()

-------------------
|"NAME"           |
-------------------
|"Joyce Ridgely"  |
|"Bradley Green"  |
-------------------

上記のように取り出すことができました。
ここでphone列も取り出したいなという時は、下記のように取り出すことができる。

python.py

flattened_df2 = df.join_table_function("flatten", col("src")["customer"]).select(col("value")["name"].alias("name"),col("value")["phone"].alias("phone"))
flattened_df2.show()

-----------------------------------
|"NAME"           |"PHONE"        |
-----------------------------------
|"Joyce Ridgely"  |"16504378889"  |
|"Bradley Green"  |"12127593751"  |
-----------------------------------

なお、下記のようにcol("src")までの指定する場合だと

python.py

flattened_df2 = df.join_table_function("flatten", col("src")]) 
flattened_df2.show()

------------------------------------------------------------------------------------------------------------------------------------------
|"SRC"                           |"SEQ"  |"KEY"     |"PATH"    |"INDEX"  |"VALUE"                       |"THIS"                          |
------------------------------------------------------------------------------------------------------------------------------------------
|{                               |1      |customer  |customer  |NULL     |[                             |{                               |
|  "customer": [                 |       |          |          |         |  {                           |  "customer": [                 |
|    {                           |       |          |          |         |    "name": "Joyce Ridgely",  |    {                           |
|      "name": "Joyce Ridgely",  |       |          |          |         |    "phone": "16504378889"    |      "name": "Joyce Ridgely",  |
|      "phone": "16504378889"    |       |          |          |         |  }                           |      "phone": "16504378889"    |
|    }                           |       |          |          |         |]                             |    }                           |
|  ]                             |       |          |          |         |                              |  ]                             |
|}                               |       |          |          |         |                              |}                               |
|{                               |2      |customer  |customer  |NULL     |[                             |{                               |
|  "customer": [                 |       |          |          |         |  {                           |  "customer": [                 |
|    {                           |       |          |          |         |    "name": "Bradley Green",  |    {                           |
|      "name": "Bradley Green",  |       |          |          |         |    "phone": "12127593751"    |      "name": "Bradley Green",  |
|      "phone": "12127593751"    |       |          |          |         |  }                           |      "phone": "12127593751"    |
|    }                           |       |          |          |         |]                             |    }                           |
|  ]                             |       |          |          |         |                              |  ]                             |
|}                               |       |          |          |         |                              |}                               |
------------------------------------------------------------------------------------------------------------------------------------------

下記のように[0]を指定して、リストから取り出す必要がありそう。

python.py

flattened_df2 = df.join_table_function("flatten", col("src")).select(col("value")[0]["name"])
flattened_df2.show()

--------------------------
|"""VALUE""[0]['NAME']"  |
--------------------------
|"Joyce Ridgely"         |
|"Bradley Green"         |
--------------------------

おまけ flatten

下記でも同じ結果が得られそうでした。

df = df.with_column("src", parse_json(col("src")))
flattened_df = df.flatten(col("src")["customer"])
flattened_df.show()

---------------------------------------------------------------------------------------------------------------------------------
|"SRC"                           |"SEQ"  |"KEY"  |"PATH"  |"INDEX"  |"VALUE"                     |"THIS"                        |
---------------------------------------------------------------------------------------------------------------------------------
|{                               |1      |NULL   |[0]     |0        |{                           |[                             |
|  "customer": [                 |       |       |        |         |  "name": "Joyce Ridgely",  |  {                           |
|    {                           |       |       |        |         |  "phone": "16504378889"    |    "name": "Joyce Ridgely",  |
|      "name": "Joyce Ridgely",  |       |       |        |         |}                           |    "phone": "16504378889"    |
|      "phone": "16504378889"    |       |       |        |         |                            |  }                           |
|    }                           |       |       |        |         |                            |]                             |
|  ]                             |       |       |        |         |                            |                              |
|}                               |       |       |        |         |                            |                              |
|{                               |2      |NULL   |[0]     |0        |{                           |[                             |
|  "customer": [                 |       |       |        |         |  "name": "Bradley Green",  |  {                           |
|    {                           |       |       |        |         |  "phone": "12127593751"    |    "name": "Bradley Green",  |
|      "name": "Bradley Green",  |       |       |        |         |}                           |    "phone": "12127593751"    |
|      "phone": "12127593751"    |       |       |        |         |                            |  }                           |
|    }                           |       |       |        |         |                            |]                             |
|  ]                             |       |       |        |         |                            |                              |
|}                               |       |       |        |         |                            |                              |
---------------------------------------------------------------------------------------------------------------------------------

flattened_df2 = flattened_df.select(col("value")["name"], col("value")["phone"])
flattened_df2.show()

----------------------------------------------
|"""VALUE""['NAME']"  |"""VALUE""['PHONE']"  |
----------------------------------------------
|"Joyce Ridgely"      |"16504378889"         |
|"Bradley Green"      |"12127593751"         |
----------------------------------------------

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up