概要
Databricks にて Oracle Autonomous Database からデータフレームを作成した際のタイムゾーンの確認結果を共有します。
以下のクエリを実行して、その差異を確認しました。
SELECT
SESSIONTIMEZONE AS SESSIONTIMEZONE
,DBTIMEZONE AS DBTIMEZONE
,LOCALTIMESTAMP AS LOCALTIMESTAMP
,CURRENT_DATE AS CURRENT_DATE
,SYSTIMESTAMP AS SYSTIMESTAMP
,CURRENT_TIMESTAMP AS CURRENT_TIMESTAMP
,CURRENT_TIMESTAMP AT LOCAL AS CURRENT_TIMESTAMP_AT_LOCAL
,CURRENT_TIMESTAMP AT TIME ZONE 'Asia/Tokyo' AS CURRENT_TIMESTAMP_AT_TIME_ZONE
FROM DUAL;
次表に検証したパターンとその結果です。
# | パターン | SESSIONTIMEZONE | DBTIMEZONE | LOCALTIMESTAMP | CURRENT_DATE |
---|---|---|---|---|---|
1 | デフォルトでの動作確認 | UTC | UTC | UTC | UTC |
2 |
sessionInitStatement にてタイムゾーンを変更 |
JST | UTC | JST | JST |
3 | jdbc オプションでタイムゾーンを変更 | UTC | UTC | UTC | UTC |
4 | Spark Config にてタイムゾーンを変更 | UTC | UTC | JST | JST |
5 | DBTIMEZONE の変更後、sessionInitStatement にてタイムゾーンを変更 |
JST | JST | JST | JST |
コードと結果
事前準備
username = "" ## ユーザー名
password = "" ## パスワード
hostname = "adb.ap-tokyo-1.oraclecloud.com" # ホスト名
service_name = "g55ca460616b750_kpwq304szjg90by8_high.adb.oraclecloud.com" # サービス名
1. デフォルトでの動作確認
Databricks にて以下のコードを実行
spark.conf.unset("spark.sql.session.timeZone")
print(spark.conf.get("spark.sql.session.timeZone"))
# connection strings
conn_str = "("
conn_str += f"description=(retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1521)(host={hostname})"
conn_str += ")"
conn_str += "("
conn_str += f"connect_data=(service_name={service_name})"
conn_str += ")"
conn_str += "(security=(ssl_server_dn_match=yes))"
conn_str += ")"
# JDBC connection URL
jdbc_url = f"jdbc:oracle:thin:@{conn_str}"
# JDBC connection properties
properties = {
"url": jdbc_url,
"user": username,
"password": password,
"driver": "oracle.jdbc.driver.OracleDriver",
"query": """
SELECT
SESSIONTIMEZONE AS SESSIONTIMEZONE
,DBTIMEZONE AS DBTIMEZONE
,LOCALTIMESTAMP AS LOCALTIMESTAMP
,CURRENT_DATE AS CURRENT_DATE
,SYSTIMESTAMP AS SYSTIMESTAMP
,CURRENT_TIMESTAMP AS CURRENT_TIMESTAMP
,CURRENT_TIMESTAMP AT LOCAL AS CURRENT_TIMESTAMP_AT_LOCAL
,CURRENT_TIMESTAMP AT TIME ZONE 'Asia/Tokyo' AS CURRENT_TIMESTAMP_AT_TIME_ZONE
FROM DUAL
""",
}
# Create spark dataframe
df = spark.read.format("jdbc").options(**properties).load()
df.display()
SESSIONTIMEZONE | DBTIMEZONE | LOCALTIMESTAMP | CURRENT_DATE | SYSTIMESTAMP | CURRENT_TIMESTAMP | CURRENT_TIMESTAMP_AT_LOCAL | CURRENT_TIMESTAMP_AT_TIME_ZONE |
---|---|---|---|---|---|---|---|
Etc/UTC | UTC | 2024-02-15T03:49:07.523+00:00 | 2024-02-15T03:49:07.000+00:00 | 2024-02-15T03:49:07.523+00:00 | 2024-02-15T03:49:07.523+00:00 | 2024-02-15T03:49:07.523+00:00 | 2024-02-15T03:49:07.523+00:00 |
2. sessionInitStatement
にてタイムゾーンを変更
Databricks にて以下のコードを実行
spark.conf.unset("spark.sql.session.timeZone")
print(spark.conf.get("spark.sql.session.timeZone"))
# connection strings
conn_str = "("
conn_str += f"description=(retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1521)(host={hostname})"
conn_str += ")"
conn_str += "("
conn_str += f"connect_data=(service_name={service_name})"
conn_str += ")"
conn_str += "(security=(ssl_server_dn_match=yes))"
conn_str += ")"
# JDBC connection URL
jdbc_url = f"jdbc:oracle:thin:@{conn_str}"
# JDBC connection properties
properties = {
"url": jdbc_url,
"user": username,
"password": password,
"driver": "oracle.jdbc.driver.OracleDriver",
"sessionInitStatement": "ALTER SESSION SET TIME_ZONE = 'Asia/Tokyo'",
"query": """
SELECT
SESSIONTIMEZONE AS SESSIONTIMEZONE
,DBTIMEZONE AS DBTIMEZONE
,LOCALTIMESTAMP AS LOCALTIMESTAMP
,CURRENT_DATE AS CURRENT_DATE
,SYSTIMESTAMP AS SYSTIMESTAMP
,CURRENT_TIMESTAMP AS CURRENT_TIMESTAMP
,CURRENT_TIMESTAMP AT LOCAL AS CURRENT_TIMESTAMP_AT_LOCAL
,CURRENT_TIMESTAMP AT TIME ZONE 'Asia/Tokyo' AS CURRENT_TIMESTAMP_AT_TIME_ZONE
FROM DUAL
""",
}
# Create spark dataframe
df = spark.read.format("jdbc").options(**properties).load()
df.display()
SESSIONTIMEZONE | DBTIMEZONE | LOCALTIMESTAMP | CURRENT_DATE | SYSTIMESTAMP | CURRENT_TIMESTAMP | CURRENT_TIMESTAMP_AT_LOCAL | CURRENT_TIMESTAMP_AT_TIME_ZONE |
---|---|---|---|---|---|---|---|
Asia/Tokyo | UTC | 2024-02-15T12:49:09.991+00:00 | 2024-02-15T12:49:09.000+00:00 | 2024-02-15T03:49:09.991+00:00 | 2024-02-15T03:49:09.991+00:00 | 2024-02-15T03:49:09.991+00:00 | 2024-02-15T03:49:09.991+00:00 |
3. jdbc オプションでタイムゾーンを変更
Databricks にてoracle.jdbc.timezoneAsRegion
とoracle.jdbc.timezone
のオプションに追加した以下のコードを実行
spark.conf.unset("spark.sql.session.timeZone")
print(spark.conf.get("spark.sql.session.timeZone"))
# connection strings
conn_str = "("
conn_str += f"description=(retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1521)(host={hostname})"
conn_str += ")"
conn_str += "("
conn_str += f"connect_data=(service_name={service_name})"
conn_str += ")"
conn_str += "(security=(ssl_server_dn_match=yes))"
conn_str += ")"
# JDBC connection URL
jdbc_url = f"jdbc:oracle:thin:@{conn_str}"
# JDBC connection properties
properties = {
"url": jdbc_url,
"user": username,
"password": password,
"driver": "oracle.jdbc.driver.OracleDriver",
"query": """
SELECT
SESSIONTIMEZONE AS SESSIONTIMEZONE
,DBTIMEZONE AS DBTIMEZONE
,LOCALTIMESTAMP AS LOCALTIMESTAMP
,CURRENT_DATE AS CURRENT_DATE
,SYSTIMESTAMP AS SYSTIMESTAMP
,CURRENT_TIMESTAMP AS CURRENT_TIMESTAMP
,CURRENT_TIMESTAMP AT LOCAL AS CURRENT_TIMESTAMP_AT_LOCAL
,CURRENT_TIMESTAMP AT TIME ZONE 'Asia/Tokyo' AS CURRENT_TIMESTAMP_AT_TIME_ZONE
FROM DUAL
""",
"oracle.jdbc.timezoneAsRegion": "false",
"oracle.jdbc.timezone": "Asia/Tokyo",
}
# Create spark dataframe
df = spark.read.format("jdbc").options(**properties).load()
df.display()
SESSIONTIMEZONE | DBTIMEZONE | LOCALTIMESTAMP | CURRENT_DATE | SYSTIMESTAMP | CURRENT_TIMESTAMP | CURRENT_TIMESTAMP_AT_LOCAL | CURRENT_TIMESTAMP_AT_TIME_ZONE |
---|---|---|---|---|---|---|---|
+00:00 | UTC | 2024-02-15T03:49:12.482+00:00 | 2024-02-15T03:49:12.000+00:00 | 2024-02-15T03:49:12.482+00:00 | 2024-02-15T03:49:12.482+00:00 | 2024-02-15T03:49:12.482+00:00 | 2024-02-15T03:49:12.482+00:00 |
4. Spark Config にてタイムゾーンを変更
Databricks にて以下のコードを実行
# Set Spark configuration timezone to Asia/Tokyo
spark.conf.set("spark.sql.session.timeZone","JST")
print(spark.conf.get("spark.sql.session.timeZone"))
# connection strings
conn_str = "("
conn_str += f"description=(retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1521)(host={hostname})"
conn_str += ")"
conn_str += "("
conn_str += f"connect_data=(service_name={service_name})"
conn_str += ")"
conn_str += "(security=(ssl_server_dn_match=yes))"
conn_str += ")"
# JDBC connection URL
jdbc_url = f"jdbc:oracle:thin:@{conn_str}"
# JDBC connection properties
properties = {
"url": jdbc_url,
"user": username,
"password": password,
"driver": "oracle.jdbc.driver.OracleDriver",
"sessionInitStatement": "ALTER SESSION SET TIME_ZONE = 'Asia/Tokyo'",
"query": """
SELECT
SESSIONTIMEZONE AS SESSIONTIMEZONE
,DBTIMEZONE AS DBTIMEZONE
,LOCALTIMESTAMP AS LOCALTIMESTAMP
,CURRENT_DATE AS CURRENT_DATE
-- 以下は Spark config を設定するとエラーとなる
--,SYSTIMESTAMP AS SYSTIMESTAMP
--,CURRENT_TIMESTAMP AS CURRENT_TIMESTAMP
--,CURRENT_TIMESTAMP AT LOCAL AS CURRENT_TIMESTAMP_AT_LOCAL
--,CURRENT_TIMESTAMP AT TIME ZONE 'Asia/Tokyo' AS CURRENT_TIMESTAMP_AT_TIME_ZONE
FROM DUAL
""",
}
# Create spark dataframe
df = spark.read.format("jdbc").options(**properties).load()
df.display()
SESSIONTIMEZONE | DBTIMEZONE | LOCALTIMESTAMP | CURRENT_DATE |
---|---|---|---|
Etc/UTC | UTC | 2024-02-15T12:49:14.895+09:00 | 2024-02-15T12:49:14.000+09:00 |
5. DBTIMEZONE の変更後、sessionInitStatement
にてタイムゾーンを変更
Oracle Autonomous Database にて DBTIMEZONE の設定を変更後、再起動を実施。
ALTER DATABASE SET TIME_ZONE='Asia/Tokyo';
SELECT DBTIMEZONE FROM dual;
SELECT DBTIMEZONE FROM dual;
Databricks にて以下のコードを実行
# `spark.sql.session.timeZone`の設定をアンセット
spark.conf.unset("spark.sql.session.timeZone")
print(spark.conf.get("spark.sql.session.timeZone"))
# connection strings
conn_str = "("
conn_str += f"description=(retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1521)(host={hostname})"
conn_str += ")"
conn_str += "("
conn_str += f"connect_data=(service_name={service_name})"
conn_str += ")"
conn_str += "(security=(ssl_server_dn_match=yes))"
conn_str += ")"
# JDBC connection URL
jdbc_url = f"jdbc:oracle:thin:@{conn_str}"
# JDBC connection properties
properties = {
"url": jdbc_url,
"user": username,
"password": password,
"driver": "oracle.jdbc.driver.OracleDriver",
"sessionInitStatement": "ALTER SESSION SET TIME_ZONE = 'Asia/Tokyo'",
"query": """
SELECT
SESSIONTIMEZONE AS SESSIONTIMEZONE
,DBTIMEZONE AS DBTIMEZONE
,LOCALTIMESTAMP AS LOCALTIMESTAMP
,CURRENT_DATE AS CURRENT_DATE
,SYSTIMESTAMP AS SYSTIMESTAMP
,CURRENT_TIMESTAMP AS CURRENT_TIMESTAMP
,CURRENT_TIMESTAMP AT LOCAL AS CURRENT_TIMESTAMP_AT_LOCAL
,CURRENT_TIMESTAMP AT TIME ZONE 'Asia/Tokyo' AS CURRENT_TIMESTAMP_AT_TIME_ZONE
FROM DUAL
""",
}
# Create spark dataframe
df = spark.read.format("jdbc").options(**properties).load()
df.display()
SESSIONTIMEZONE | DBTIMEZONE | LOCALTIMESTAMP | CURRENT_DATE | SYSTIMESTAMP | CURRENT_TIMESTAMP | CURRENT_TIMESTAMP_AT_LOCAL | CURRENT_TIMESTAMP_AT_TIME_ZONE |
---|---|---|---|---|---|---|---|
Asia/Tokyo | Asia/Tokyo | 2024-02-15T12:51:24.063+00:00 | 2024-02-15T12:51:24.000+00:00 | 2024-02-15T03:51:24.063+00:00 | 2024-02-15T03:51:24.063+00:00 | 2024-02-15T03:51:24.063+00:00 | 2024-02-15T03:51:24.063+00:00 |
現状の課題
1. Spark Conifg にタイムゾーンを変更した場合にSYSTIMESTAMP
とCURRENT_TIMESTAMP
が動作しなくなった
次のようなエラーが発生。本検証では、Spark Conifg 変更の検証時にコメントを実施。
# connection strings
conn_str = "("
conn_str += f"description=(retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1521)(host={hostname})"
conn_str += ")"
conn_str += "("
conn_str += f"connect_data=(service_name={service_name})"
conn_str += ")"
conn_str += "(security=(ssl_server_dn_match=yes))"
conn_str += ")"
# JDBC connection URL
jdbc_url = f"jdbc:oracle:thin:@{conn_str}"
# JDBC connection properties
properties = {
"url": jdbc_url,
"user": username,
"password": password,
"driver": "oracle.jdbc.driver.OracleDriver",
"query": """
SELECT
SYSTIMESTAMP AS SYSTIMESTAMP
-- CURRENT_TIMESTAMP AS CURRENT_TIMESTAMP
-- CURRENT_TIMESTAMP AT LOCAL AS CURRENT_TIMESTAMP_AT_LOCAL
-- CURRENT_TIMESTAMP AT TIME ZONE 'Asia/Tokyo' AS CURRENT_TIMESTAMP_AT_TIME_ZONE
FROM DUAL
""",
}
# Create spark dataframe
df = spark.read.format("jdbc").options(**properties).load()
df.display()
org.apache.spark.SparkSQLException: [UNRECOGNIZED_SQL_TYPE] Unrecognized SQL type - name: TIMESTAMP WITH TIME ZONE, id: -101.