Power Query で記述してみたらってこともならべておく。
range
- データ型: pyspark.sql.types.LongType / LONG, BIGINT
- 列名:
id
-
end
を含まない
example: start 0
end 10
step 2
PySpark : Runtime 1.2
spark.range(0,10,2).show()
Spark SQL : Runtime 1.2
SELECT
*
FROM
range(0, 10, 2)
Power Query : range 相当
Table.FromColumns(
{
List.Generate(
()=> 0,
each _ < 10,
each _ + 2
)
},
type table [id = Int64.Type]
)
sequence
Generates an array of elements from start to stop (inclusive), incrementing by step. The type of the returned elements is the same as the type of argument expressions.
Supported types are: byte, short, integer, long, date, timestamp.
-
stop
も含む
number
PySpark : Runtime 1.2
from pyspark.sql.functions import lit, sequence, explode
(spark.range(1)
.select(
explode(
sequence(lit(1), lit(5))
)
)
).show()
Spark SQL : Runtime 1.2
SELECT
explode(
sequence(1, 5)
)
Power Query
Table.ExpandListColumn(
#table(
type table [col = {number}],
{
{
{1..5}
}
}
),
"col"
)
date
PySpark : Runtime 1.2
from pyspark.sql.functions import lit, sequence, explode, to_date, expr
(spark.range(1)
.select(
explode(
sequence(
to_date(lit("2024-04-01"))
, to_date(lit("2025-03-31"))
, expr("interval 1 day")
)
).alias("column_date")
)
).show()
Spark SQL : Runtime 1.2
SELECT
explode(
sequence(
to_date("2024-04-01")
, to_date("2025-03-31")
, interval 1 day
)
) AS column_date
Power Query
Table.FromColumns(
{
List.Generate(
()=> Date.From("2024-04-01"),
each _ <= Date.From("2025-03-31"),
each _ + #duration(1, 0, 0, 0)
// each Date.AddDays(_, 1)
)
},
type table [column_date = date]
)
datetime
PySpark : Runtime 1.2
from pyspark.sql.functions import lit, sequence, explode, to_timestamp, expr
(spark.range(1)
.select(
explode(
sequence(
to_timestamp(lit("2024-04-01 00:00:00"))
, to_timestamp(lit("2024-04-01 23:59:59"))
, expr("INTERVAL '1:30' HOUR TO MINUTE")
)
).alias("column_datetime")
)
).show()
Spark SQL : Runtime 1.2
SELECT
explode(
sequence(
to_timestamp("2024-04-01 00:00:00")
, to_timestamp("2024-04-01 23:59:59")
, INTERVAL '1:30' HOUR TO MINUTE
)
) AS column_datetime
思ったこと🙄
TIMESTAMP の列では タイムゾーン情報は切り捨てて、UTC として。
TIMESTAMP_NTZ の列は Lakehouse - SQL Analytics endpoint でサポートされない。
その他