この書き方だとbroadcastjoin を強制できる。

#tableA と tableBは事前に作ってあるものとする
from pyspark.sql.functions import *
tableB_b = broadcast(tableB)
tableB_b.createOrReplaceTmpView("tableB_b")
joinedtable = spark.sql("""
SELECT
a.*,
b.*,
FROM
tableA as a
JOIN 
tableB_b as b
ON
a.key = b.key
""")
spark.sql("""
SELECT
a.*,
b.*,
FROM
tableA as a
JOIN 
tableB_b as b
ON
a.key = b.key
""").explain() で実行計画を表示
Sign up for free and join this conversation.
Sign Up
If you already have a Qiita account log in.