0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

Databricks にて Oracle Autonomous Database からデータフレームを作成した際のタイムゾーンの確認

Posted at

概要

Databricks にて Oracle Autonomous Database からデータフレームを作成した際のタイムゾーンの確認結果を共有します。

以下のクエリを実行して、その差異を確認しました。

SELECT
    SESSIONTIMEZONE AS SESSIONTIMEZONE
    ,DBTIMEZONE AS DBTIMEZONE
    ,LOCALTIMESTAMP AS LOCALTIMESTAMP
    ,CURRENT_DATE AS CURRENT_DATE
    ,SYSTIMESTAMP AS SYSTIMESTAMP
    ,CURRENT_TIMESTAMP AS CURRENT_TIMESTAMP
    ,CURRENT_TIMESTAMP AT LOCAL AS CURRENT_TIMESTAMP_AT_LOCAL
    ,CURRENT_TIMESTAMP AT TIME ZONE 'Asia/Tokyo' AS CURRENT_TIMESTAMP_AT_TIME_ZONE
    FROM DUAL;

image.png

次表に検証したパターンとその結果です。

# パターン SESSIONTIMEZONE DBTIMEZONE LOCALTIMESTAMP CURRENT_DATE
1 デフォルトでの動作確認 UTC UTC UTC UTC
2 sessionInitStatementにてタイムゾーンを変更 JST UTC JST JST
3 jdbc オプションでタイムゾーンを変更 UTC UTC UTC UTC
4 Spark Config にてタイムゾーンを変更 UTC UTC JST JST
5 DBTIMEZONE の変更後、sessionInitStatementにてタイムゾーンを変更 JST JST JST JST

コードと結果

事前準備

username = ""  ## ユーザー名
password = ""  ## パスワード

hostname = "adb.ap-tokyo-1.oraclecloud.com"  # ホスト名
service_name = "g55ca460616b750_kpwq304szjg90by8_high.adb.oraclecloud.com"  # サービス名

image.png

image.png

1. デフォルトでの動作確認

Databricks にて以下のコードを実行

spark.conf.unset("spark.sql.session.timeZone")
print(spark.conf.get("spark.sql.session.timeZone"))

# connection strings
conn_str = "("
conn_str += f"description=(retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1521)(host={hostname})"
conn_str += ")"
conn_str += "("
conn_str += f"connect_data=(service_name={service_name})"
conn_str += ")"
conn_str += "(security=(ssl_server_dn_match=yes))"
conn_str += ")"

# JDBC connection URL
jdbc_url = f"jdbc:oracle:thin:@{conn_str}"

# JDBC connection properties
properties = {
    "url": jdbc_url,
    "user": username,
    "password": password,
    "driver": "oracle.jdbc.driver.OracleDriver",
    "query": """
    SELECT
        SESSIONTIMEZONE AS SESSIONTIMEZONE
        ,DBTIMEZONE AS DBTIMEZONE
        ,LOCALTIMESTAMP AS LOCALTIMESTAMP
        ,CURRENT_DATE AS CURRENT_DATE
        ,SYSTIMESTAMP AS SYSTIMESTAMP
        ,CURRENT_TIMESTAMP AS CURRENT_TIMESTAMP
        ,CURRENT_TIMESTAMP AT LOCAL AS CURRENT_TIMESTAMP_AT_LOCAL
        ,CURRENT_TIMESTAMP AT TIME ZONE 'Asia/Tokyo' AS CURRENT_TIMESTAMP_AT_TIME_ZONE
        FROM DUAL
    """,
}

# Create spark dataframe
df = spark.read.format("jdbc").options(**properties).load()

df.display()
SESSIONTIMEZONE DBTIMEZONE LOCALTIMESTAMP CURRENT_DATE SYSTIMESTAMP CURRENT_TIMESTAMP CURRENT_TIMESTAMP_AT_LOCAL CURRENT_TIMESTAMP_AT_TIME_ZONE
Etc/UTC UTC 2024-02-15T03:49:07.523+00:00 2024-02-15T03:49:07.000+00:00 2024-02-15T03:49:07.523+00:00 2024-02-15T03:49:07.523+00:00 2024-02-15T03:49:07.523+00:00 2024-02-15T03:49:07.523+00:00

image.png

2. sessionInitStatementにてタイムゾーンを変更

Databricks にて以下のコードを実行

spark.conf.unset("spark.sql.session.timeZone")
print(spark.conf.get("spark.sql.session.timeZone"))

# connection strings
conn_str = "("
conn_str += f"description=(retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1521)(host={hostname})"
conn_str += ")"
conn_str += "("
conn_str += f"connect_data=(service_name={service_name})"
conn_str += ")"
conn_str += "(security=(ssl_server_dn_match=yes))"
conn_str += ")"

# JDBC connection URL
jdbc_url = f"jdbc:oracle:thin:@{conn_str}"

# JDBC connection properties
properties = {
    "url": jdbc_url,
    "user": username,
    "password": password,
    "driver": "oracle.jdbc.driver.OracleDriver",
    "sessionInitStatement": "ALTER SESSION SET TIME_ZONE = 'Asia/Tokyo'",
    "query": """
    SELECT
        SESSIONTIMEZONE AS SESSIONTIMEZONE
        ,DBTIMEZONE AS DBTIMEZONE
        ,LOCALTIMESTAMP AS LOCALTIMESTAMP
        ,CURRENT_DATE AS CURRENT_DATE
        ,SYSTIMESTAMP AS SYSTIMESTAMP
        ,CURRENT_TIMESTAMP AS CURRENT_TIMESTAMP
        ,CURRENT_TIMESTAMP AT LOCAL AS CURRENT_TIMESTAMP_AT_LOCAL
        ,CURRENT_TIMESTAMP AT TIME ZONE 'Asia/Tokyo' AS CURRENT_TIMESTAMP_AT_TIME_ZONE
        FROM DUAL
    """,
}

# Create spark dataframe
df = spark.read.format("jdbc").options(**properties).load()

df.display()
SESSIONTIMEZONE DBTIMEZONE LOCALTIMESTAMP CURRENT_DATE SYSTIMESTAMP CURRENT_TIMESTAMP CURRENT_TIMESTAMP_AT_LOCAL CURRENT_TIMESTAMP_AT_TIME_ZONE
Asia/Tokyo UTC 2024-02-15T12:49:09.991+00:00 2024-02-15T12:49:09.000+00:00 2024-02-15T03:49:09.991+00:00 2024-02-15T03:49:09.991+00:00 2024-02-15T03:49:09.991+00:00 2024-02-15T03:49:09.991+00:00

image.png

3. jdbc オプションでタイムゾーンを変更

Databricks にてoracle.jdbc.timezoneAsRegionoracle.jdbc.timezoneのオプションに追加した以下のコードを実行

spark.conf.unset("spark.sql.session.timeZone")
print(spark.conf.get("spark.sql.session.timeZone"))

# connection strings
conn_str = "("
conn_str += f"description=(retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1521)(host={hostname})"
conn_str += ")"
conn_str += "("
conn_str += f"connect_data=(service_name={service_name})"
conn_str += ")"
conn_str += "(security=(ssl_server_dn_match=yes))"
conn_str += ")"

# JDBC connection URL
jdbc_url = f"jdbc:oracle:thin:@{conn_str}"

# JDBC connection properties
properties = {
    "url": jdbc_url,
    "user": username,
    "password": password,
    "driver": "oracle.jdbc.driver.OracleDriver",
    "query": """
    SELECT
        SESSIONTIMEZONE AS SESSIONTIMEZONE
        ,DBTIMEZONE AS DBTIMEZONE
        ,LOCALTIMESTAMP AS LOCALTIMESTAMP
        ,CURRENT_DATE AS CURRENT_DATE
        ,SYSTIMESTAMP AS SYSTIMESTAMP
        ,CURRENT_TIMESTAMP AS CURRENT_TIMESTAMP
        ,CURRENT_TIMESTAMP AT LOCAL AS CURRENT_TIMESTAMP_AT_LOCAL
        ,CURRENT_TIMESTAMP AT TIME ZONE 'Asia/Tokyo' AS CURRENT_TIMESTAMP_AT_TIME_ZONE
        FROM DUAL
    """,
    "oracle.jdbc.timezoneAsRegion": "false",
    "oracle.jdbc.timezone": "Asia/Tokyo",
}

# Create spark dataframe
df = spark.read.format("jdbc").options(**properties).load()

df.display()
SESSIONTIMEZONE DBTIMEZONE LOCALTIMESTAMP CURRENT_DATE SYSTIMESTAMP CURRENT_TIMESTAMP CURRENT_TIMESTAMP_AT_LOCAL CURRENT_TIMESTAMP_AT_TIME_ZONE
+00:00 UTC 2024-02-15T03:49:12.482+00:00 2024-02-15T03:49:12.000+00:00 2024-02-15T03:49:12.482+00:00 2024-02-15T03:49:12.482+00:00 2024-02-15T03:49:12.482+00:00 2024-02-15T03:49:12.482+00:00

image.png

4. Spark Config にてタイムゾーンを変更

Databricks にて以下のコードを実行

# Set Spark configuration timezone to Asia/Tokyo
spark.conf.set("spark.sql.session.timeZone","JST")
print(spark.conf.get("spark.sql.session.timeZone"))

# connection strings
conn_str = "("
conn_str += f"description=(retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1521)(host={hostname})"
conn_str += ")"
conn_str += "("
conn_str += f"connect_data=(service_name={service_name})"
conn_str += ")"
conn_str += "(security=(ssl_server_dn_match=yes))"
conn_str += ")"

# JDBC connection URL
jdbc_url = f"jdbc:oracle:thin:@{conn_str}"

# JDBC connection properties
properties = {
    "url": jdbc_url,
    "user": username,
    "password": password,
    "driver": "oracle.jdbc.driver.OracleDriver",
    "sessionInitStatement": "ALTER SESSION SET TIME_ZONE = 'Asia/Tokyo'",
    "query": """
    SELECT
        SESSIONTIMEZONE AS SESSIONTIMEZONE
        ,DBTIMEZONE AS DBTIMEZONE
        ,LOCALTIMESTAMP AS LOCALTIMESTAMP
        ,CURRENT_DATE AS CURRENT_DATE
        -- 以下は Spark config を設定するとエラーとなる
        --,SYSTIMESTAMP AS SYSTIMESTAMP
        --,CURRENT_TIMESTAMP AS CURRENT_TIMESTAMP
        --,CURRENT_TIMESTAMP AT LOCAL AS CURRENT_TIMESTAMP_AT_LOCAL
        --,CURRENT_TIMESTAMP AT TIME ZONE 'Asia/Tokyo' AS CURRENT_TIMESTAMP_AT_TIME_ZONE
        FROM DUAL
    """,
}

# Create spark dataframe
df = spark.read.format("jdbc").options(**properties).load()

df.display()
SESSIONTIMEZONE DBTIMEZONE LOCALTIMESTAMP CURRENT_DATE
Etc/UTC UTC 2024-02-15T12:49:14.895+09:00 2024-02-15T12:49:14.000+09:00

image.png

5. DBTIMEZONE の変更後、sessionInitStatementにてタイムゾーンを変更

Oracle Autonomous Database にて DBTIMEZONE の設定を変更後、再起動を実施。

ALTER DATABASE SET TIME_ZONE='Asia/Tokyo';
SELECT DBTIMEZONE FROM dual;

image.png

image.png

SELECT DBTIMEZONE FROM dual;

image.png

Databricks にて以下のコードを実行

# `spark.sql.session.timeZone`の設定をアンセット
spark.conf.unset("spark.sql.session.timeZone")
print(spark.conf.get("spark.sql.session.timeZone"))

# connection strings
conn_str = "("
conn_str += f"description=(retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1521)(host={hostname})"
conn_str += ")"
conn_str += "("
conn_str += f"connect_data=(service_name={service_name})"
conn_str += ")"
conn_str += "(security=(ssl_server_dn_match=yes))"
conn_str += ")"

# JDBC connection URL
jdbc_url = f"jdbc:oracle:thin:@{conn_str}"

# JDBC connection properties
properties = {
    "url": jdbc_url,
    "user": username,
    "password": password,
    "driver": "oracle.jdbc.driver.OracleDriver",
    "sessionInitStatement": "ALTER SESSION SET TIME_ZONE = 'Asia/Tokyo'",
    "query": """
    SELECT
        SESSIONTIMEZONE AS SESSIONTIMEZONE
        ,DBTIMEZONE AS DBTIMEZONE
        ,LOCALTIMESTAMP AS LOCALTIMESTAMP
        ,CURRENT_DATE AS CURRENT_DATE
        ,SYSTIMESTAMP AS SYSTIMESTAMP
        ,CURRENT_TIMESTAMP AS CURRENT_TIMESTAMP
        ,CURRENT_TIMESTAMP AT LOCAL AS CURRENT_TIMESTAMP_AT_LOCAL
        ,CURRENT_TIMESTAMP AT TIME ZONE 'Asia/Tokyo' AS CURRENT_TIMESTAMP_AT_TIME_ZONE
        FROM DUAL
    """,
}

# Create spark dataframe
df = spark.read.format("jdbc").options(**properties).load()

df.display()
SESSIONTIMEZONE DBTIMEZONE LOCALTIMESTAMP CURRENT_DATE SYSTIMESTAMP CURRENT_TIMESTAMP CURRENT_TIMESTAMP_AT_LOCAL CURRENT_TIMESTAMP_AT_TIME_ZONE
Asia/Tokyo Asia/Tokyo 2024-02-15T12:51:24.063+00:00 2024-02-15T12:51:24.000+00:00 2024-02-15T03:51:24.063+00:00 2024-02-15T03:51:24.063+00:00 2024-02-15T03:51:24.063+00:00 2024-02-15T03:51:24.063+00:00

image.png

現状の課題

1. Spark Conifg にタイムゾーンを変更した場合にSYSTIMESTAMPCURRENT_TIMESTAMPが動作しなくなった

次のようなエラーが発生。本検証では、Spark Conifg 変更の検証時にコメントを実施。

# connection strings
conn_str = "("
conn_str += f"description=(retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1521)(host={hostname})"
conn_str += ")"
conn_str += "("
conn_str += f"connect_data=(service_name={service_name})"
conn_str += ")"
conn_str += "(security=(ssl_server_dn_match=yes))"
conn_str += ")"

# JDBC connection URL
jdbc_url = f"jdbc:oracle:thin:@{conn_str}"

# JDBC connection properties
properties = {
    "url": jdbc_url,
    "user": username,
    "password": password,
    "driver": "oracle.jdbc.driver.OracleDriver",
    "query": """
    SELECT
        SYSTIMESTAMP AS SYSTIMESTAMP
        -- CURRENT_TIMESTAMP AS CURRENT_TIMESTAMP
        -- CURRENT_TIMESTAMP AT LOCAL AS CURRENT_TIMESTAMP_AT_LOCAL
        -- CURRENT_TIMESTAMP AT TIME ZONE 'Asia/Tokyo' AS CURRENT_TIMESTAMP_AT_TIME_ZONE
        FROM DUAL
    """,
}

# Create spark dataframe
df = spark.read.format("jdbc").options(**properties).load()

df.display()
org.apache.spark.SparkSQLException: [UNRECOGNIZED_SQL_TYPE] Unrecognized SQL type - name: TIMESTAMP WITH TIME ZONE, id: -101.
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?