扱うデータ型は重要だ。まずは多くつかうであろうデータ型とは?って考えた。
Dataflow Gen2 → Lakehouse
Destination type は Power BI セマンティックのモデルのデータ型と一致するようだ。
Power Query | Destination type | SQL name | pyspark.sql.types |
---|---|---|---|
type text, Text.Type | Text | STRING | StringType() |
type number, Number.Type | Decimal number | DOUBLE | DoubleType() |
type date, Date.Type | Date | DATE | DateType() |
type datetime, DateTime.Type | Date/Time | TIMESTAMP | TimestampType() |
type datetimezone, DateTimeZone.Type | - | ||
type time, Time.Type | - | ||
type duration, Duration.Type | - | ||
type logical, Logical.Type | True/False | BOOLEAN | BooleanType() |
type binary, Binary.Type | - | ||
Decimal.Type | Decimal number | ||
Int64.Type | Whole number | BIGINT, LONG | LongType() |
Int32.Type | Whole number | INT, INTEGER | IntegerType() |
Int16.Type | Whole number | SMALLINT | ShortType() |
Int8.Type | Whole number | SMALLINT | ShortType() |
Currency.Type | Currency | DECIMAL(19, 4) | DecimalType(19, 4) |
Percentage.Type | Percentage | DECIMAL(34, 6)? | DecimalType(34, 6)? |
type table, Table.Type | - | ||
type record, Record.Type | - | ||
type list, List.Type | - | ||
type any, Any.Type | - | ||
type anynonnull | - |
Power Query
let
Source = Table.FromRows(
{{
"0", 0, #date(2024, 4, 4), #datetime(2024,4,4,10,50,30),
0, 0, 0, 0, 0, 0, 0, 0
}},
{
"TextType", "NumberType", "DateType", "DateTimeType",
"LogicalType", "DecimalType", "Int64Type", "Int32Type",
"Int16Type", "Int8Type", "CurrencyType", "PercentageType"
}
),
ChangedType = Table.TransformColumnTypes(
Source,
{
{"TextType", Text.Type},
{"NumberType", Number.Type},
{"DateType", Date.Type},
{"DateTimeType", DateTime.Type},
{"LogicalType", Logical.Type},
{"DecimalType", Decimal.Type},
{"Int64Type", Int64.Type},
{"Int32Type", Int32.Type},
{"Int16Type", Int16.Type},
{"Int8Type", Int8.Type},
{"CurrencyType", Currency.Type},
{"PercentageType", Percentage.Type}
}
)
in
ChangedType
PySpark : Runtime 1.2
spark.sql("desc DataTypes").show(truncate=False)
+--------------+-------------+-------+
|col_name |data_type |comment|
+--------------+-------------+-------+
|TextType |string |null |
|NumberType |double |null |
|DateType |date |null |
|DateTimeType |timestamp |null |
|LogicalType |boolean |null |
|DecimalType |decimal(34,6)|null |
|Int64Type |bigint |null |
|Int32Type |int |null |
|Int16Type |bigint |null |
|Int8Type |smallint |null |
|CurrencyType |smallint |null |
|PercentageType|decimal(34,6)|null |
+--------------+-------------+-------+
思ったこと🙄
Dataflow Gen2 / Direct Lake mode のことを考慮すると、データ型使い分けの詳細を知ろうとするのは後回しで充分。
その他