Snowflake Open Catalog 経由で Snowflake-managed Apache Iceberg™ table にクエリする際の Cannot support vectorized reads エラーへの対応方法

Posted at 2025-03-04

概要

Snowflake Open Catalog 経由で Snowflake-managed Apache Iceberg™ table にクエリする際の Cannot support vectorized reads エラーへの対応方法を共有します。対応方法は、Spark Dataframe の読み込み時にvectorization-enabledオプションをfalseに設定することです。

java.lang.UnsupportedOperationException: Cannot support vectorized reads for column [C_CUSTKEY] optional int32 C_CUSTKEY (INTEGER(32,true)) = 1 with encoding DELTA_BINARY_PACKED. Disable vectorized reads to read this table/file

Apache Iceberg の issue にて下記のように言及されていました。

出所：Support Parquet v2 Spark vectorized read · Issue #7162 · apache/iceberg

事前準備

Snowflake のチュートリアルであるApache Iceberg™ テーブルを作成するをデータをロードしてテーブルをクエリするまで実施します。

出所：チュートリアル: Apache Iceberg™ テーブルを作成する | Snowflake Documentation

下記のドキュメントを参考に Snowflake-managed Apache Iceberg table を Snowflake Open Catalog に同期します。

出所：Snowflakeで管理されたテーブルを Snowflake Open Catalog と同期する | Snowflake Documentation

エラーの詳細

df = spark.read \
    .format("iceberg") \
    .table("ICEBERG_TUTORIAL_DB.PUBLIC.CUSTOMER_ICEBERG2")
df.show()

Py4JJavaError: An error occurred while calling o57.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (4eec7094b130 executor driver): java.lang.UnsupportedOperationException: Cannot support vectorized reads for column [C_CUSTKEY] optional int32 C_CUSTKEY (INTEGER(32,true)) = 1 with encoding DELTA_BINARY_PACKED. Disable vectorized reads to read this table/file

エラーへの対応方法

# Snowflake Open Catalog に関する情報をセット
open_catalog_account_identifier = "lampvrn-test123"
client_id = "8bq6hVXHOxV5ZKJQfb="
client_secret = "52/mD7sDxUQWjgZgxGCx37lO2x02fV="
catalog_name = "manabian_10"
principal_role_name = "spark_all"

# Azure Storage に関る情報をセット
azure_storage_account_name = "snowflakeiceberg0123456"
azure_storage_account_key = "kKRn6nruDHEhUF0AQzM1PFlXtXs9V0BJQSrn6Z8GvJOsb0JHflV9Fn=="

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('iceberg_lab') \
    .config('spark.jars.packages',
            'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,'
            'org.apache.iceberg:iceberg-azure-bundle:1.5.2,'
            'org.apache.hadoop:hadoop-azure:3.2.0') \
    .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
    .config('spark.sql.defaultCatalog', 'opencatalog') \
    .config('spark.sql.catalog.opencatalog', 'org.apache.iceberg.spark.SparkCatalog') \
    .config('spark.sql.catalog.opencatalog.type', 'rest') \
    .config('spark.sql.catalog.opencatalog.header.X-Iceberg-Access-Delegation', 'vended-credentials') \
    .config('spark.sql.catalog.opencatalog.uri', f'https://{open_catalog_account_identifier}.snowflakecomputing.com/polaris/api/catalog') \
    .config('spark.sql.catalog.opencatalog.credential', f'{client_id}:{client_secret}') \
    .config('spark.sql.catalog.opencatalog.warehouse', catalog_name) \
    .config('spark.sql.catalog.opencatalog.scope', f'PRINCIPAL_ROLE:{principal_role_name}') \
    .config('spark.sql.catalog.opencatalog.enable.credential.vending', 'true') \
    .config(f'spark.hadoop.fs.azure.account.key.{azure_storage_account_name}.blob.core.windows.net', azure_storage_account_key) \
    .getOrCreate()
spark

df = spark.read \
    .format("iceberg") \
    .option("vectorization-enabled", "false") \
    .table("ICEBERG_TUTORIAL_DB.PUBLIC.CUSTOMER_ICEBERG2")
df.show()

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up