- IBM Cloud上のSaaS版についての記事です。
watsonx.dataは「単一の統合データ・プラットフォームですべての企業データを収集、保存、照会、分析するためのデータ管理ソリューションです。オープン・データ・フォーマットに最適化された、柔軟で信頼性の高いプラットフォームを提供します」とのことで、いわゆるデータレイクハウスです。SW版とSaaS版があります。
このページでは、Presto CLIのオプションによる出力の違いを、watsonx.data SaaS版のPrestoエンジンに接続して確認したを述べています。
参照情報
構成方法については次をご参照ください。
watsonx.data SaaS版のPresto エンジンに Presto CLI からアクセスする
環境
使用環境は次の通りです。
- クライアント
$ presto --version
Presto CLI 0.296-8661b73
$ cat /etc/redhat-release
Red Hat Enterprise Linux release 9.4 (Plow)
$ java -version
openjdk version "1.8.0_472"
OpenJDK Runtime Environment (build 1.8.0_472-b08)
OpenJDK 64-Bit Server VM (build 25.472-b08, mixed mode)
$
- watsonx.data SaaS
watsonx.data
Version: 2.3.0.0.0
Console build: 2.3.0-22934-20260128030655-wxd-v2.3.0-nf2
1. Output format for batch mode に関する覚書
--output-formatを使用した場合の出力形式を覚書として残します。
$ presto --help
NAME
presto - Presto interactive console
(skip)
--output-format <output-format>
Output format for batch mode [ALIGNED, VERTICAL, JSON, CSV, TSV,
CSV_HEADER, TSV_HEADER, NULL] (default: CSV)
(skip)
'-f'オプションで入力ファイルを指定して実行すると batch modeで出力されます。
ここでは
入力ファイルとして select_region.sqlを使用しています。 select_region.sqlには次の1行が含まれています。
select * from tpch.tiny.region;
default: CSV
$ presto -f select_region.sql --server https://xxxx.xxxx.lakehouse.ibmappdomain.cloud:33333 --user ibmlhapikey_userid --access-token $TOKEN
"0","AFRICA","lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to "
"1","AMERICA","hs use ironic, even requests. s"
"2","ASIA","ges. thinly even pinto beans ca"
"3","EUROPE","ly final courts cajole furiously final excuse"
"4","MIDDLE EAST","uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl"
$
--output-format CSV_HEADER
presto -f select_region.sql --output-format CSV_HEADER --server https://xxxx.xxxx.lakehouse.ibmappdomain.cloud:33333 --user ibmlhapikey_userid --access-token $TOKEN
$ presto -f select_region.sql --output-format CSV_HEADER --server https://xxxx.xxxx.lakehouse.ibmappdomain.cloud:33333 --user ibmlhapikey_userid --access-token $TOKEN
"r_regionkey","r_name","r_comment"
"0","AFRICA","lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to "
"1","AMERICA","hs use ironic, even requests. s"
"2","ASIA","ges. thinly even pinto beans ca"
"3","EUROPE","ly final courts cajole furiously final excuse"
"4","MIDDLE EAST","uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl"
$
--output-format ALIGNED
presto -f select_region.sql --output-format ALIGNED --server https://xxxx.xxxx.lakehouse.ibmappdomain.cloud:33333 --user ibmlhapikey_userid --access-token $TOKEN
$ presto -f select_region.sql --output-format ALIGNED --server https://xxxx.xxxx.lakehouse.ibmappdomain.cloud:33333 --user ibmlhapikey_userid --access-token $TOKEN
r_regionkey | r_name | r_comment
-------------+-------------+---------------------------------------------------------------------------------------------------------------------
0 | AFRICA | lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to
1 | AMERICA | hs use ironic, even requests. s
2 | ASIA | ges. thinly even pinto beans ca
3 | EUROPE | ly final courts cajole furiously final excuse
4 | MIDDLE EAST | uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl
(5 rows)
$
--output-format TSV
presto -f select_region.sql --output-format TSV --server https://xxxx.xxxx.lakehouse.ibmappdomain.cloud:33333 --user ibmlhapikey_userid --access-token $TOKEN
$ presto -f select_region.sql --output-format TSV --server https://xxxx.xxxx.lakehouse.ibmappdomain.cloud:33333 --user ibmlhapikey_userid --access-token $TOKEN
0 AFRICA lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to
1 AMERICA hs use ironic, even requests. s
2 ASIA ges. thinly even pinto beans ca
3 EUROPE ly final courts cajole furiously final excuse
4 MIDDLE EAST uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl
$
--output-format TSV_HEADER
presto -f select_region.sql --output-format TSV_HEADER --server https://xxxx.xxxx.lakehouse.ibmappdomain.cloud:33333 --user ibmlhapikey_userid --access-token $TOKEN
$ presto -f select_region.sql --output-format TSV_HEADER --server https://xxxx.xxxx.lakehouse.ibmappdomain.cloud:33333 --user ibmlhapikey_userid --access-token $TOKEN
r_regionkey r_name r_comment
0 AFRICA lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to
1 AMERICA hs use ironic, even requests. s
2 ASIA ges. thinly even pinto beans ca
3 EUROPE ly final courts cajole furiously final excuse
4 MIDDLE EAST uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl
$
--output-format VERTICAL
presto -f select_region.sql --output-format VERTICAL --server https://xxxx.xxxx.lakehouse.ibmappdomain.cloud:33333 --user ibmlhapikey_userid --access-token $TOKEN
$ presto -f select_region.sql --output-format VERTICAL --server https://xxxx.xxxx.lakehouse.ibmappdomain.cloud:33333 --user ibmlhapikey_userid --access-token $TOKEN
-[ RECORD 1 ]--------------------------------------------------------------------------------------------------------------------
r_regionkey | 0
r_name | AFRICA
r_comment | lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to
-[ RECORD 2 ]--------------------------------------------------------------------------------------------------------------------
r_regionkey | 1
r_name | AMERICA
r_comment | hs use ironic, even requests. s
-[ RECORD 3 ]--------------------------------------------------------------------------------------------------------------------
r_regionkey | 2
r_name | ASIA
r_comment | ges. thinly even pinto beans ca
-[ RECORD 4 ]--------------------------------------------------------------------------------------------------------------------
r_regionkey | 3
r_name | EUROPE
r_comment | ly final courts cajole furiously final excuse
-[ RECORD 5 ]--------------------------------------------------------------------------------------------------------------------
r_regionkey | 4
r_name | MIDDLE EAST
r_comment | uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl
$
--output-format JSON
presto -f select_region.sql --output-format JSON --server https://xxxx.xxxx.lakehouse.ibmappdomain.cloud:33333 --user ibmlhapikey_userid --access-token $TOKEN
$ presto -f select_region.sql --output-format JSON --server https://xxxx.xxxx.lakehouse.ibmappdomain.cloud:33333 --user ibmlhapikey_userid --access-token $TOKEN
{"r_regionkey":0,"r_name":"AFRICA","r_comment":"lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to "}
{"r_regionkey":1,"r_name":"AMERICA","r_comment":"hs use ironic, even requests. s"}
{"r_regionkey":2,"r_name":"ASIA","r_comment":"ges. thinly even pinto beans ca"}
{"r_regionkey":3,"r_name":"EUROPE","r_comment":"ly final courts cajole furiously final excuse"}
{"r_regionkey":4,"r_name":"MIDDLE EAST","r_comment":"uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl"}
$
--output-format NULL
presto -f select_region.sql --output-format NULL --server https://xxxx.xxxx.lakehouse.ibmappdomain.cloud:33333 --user ibmlhapikey_userid --access-token $TOKEN
$ presto -f select_region.sql --output-format NULL --server https://xxxx.xxxx.lakehouse.ibmappdomain.cloud:33333 --user ibmlhapikey_userid --access-token $TOKEN
$
2. --debugと --runtime-stats
debug , runtime-stats に関するHelpは次の通りです。得られた出力を覚書として残します。
$ presto --help
(skip)
--debug
Enable debug information
(skip)
--runtime-stats
Enable runtime stats information. Flag must be used in conjunction
with the --debug flag
(skip)
--debug
--execute でエラーになるQueryを実行するとStackTraceが表示されます。
presto --debug --execute "select * from region;" --server https://xxxx.xxxx.lakehouse.ibmappdomain.cloud:33333 --user ibmlhapikey_userid --access-token $TOKEN
$ presto --debug --execute "select * from region;" --server https://xxxx.xxxx.lakehouse.ibmappdomain.cloud:33333 --user ibmlhapikey_userid --access-token $TOKEN
Query 20260203_061550_02016_cuz7g failed: line 1:15: Schema must be specified when session schema is not set
com.facebook.presto.sql.analyzer.SemanticException: line 1:15: Schema must be specified when session schema is not set
at com.facebook.presto.sql.analyzer.utils.MetadataUtils.lambda$createQualifiedObjectName$0(MetadataUtils.java:53)
at java.base/java.util.Optional.orElseThrow(Optional.java:403)
(skip)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
select * from region
$
--runtime-stats は --debugと一緒に 対話モードで用いる。
--runtime-stats は --debugと一緒に 対話モードで用います。
詳細な実行統計情報が表示されます。
$ presto --debug --runtime-stats --server https://xxxx.xxxx.lakehouse.ibmappdomain.cloud:33333 --user ibmlhapikey_userid --access-token $TOKEN
presto> select * from tpch.tiny.region;
r_regionkey | r_name | r_comment
-------------+-------------+---------------------------------------------------------------------------------------------------------------------
0 | AFRICA | lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to
1 | AMERICA | hs use ironic, even requests. s
2 | ASIA | ges. thinly even pinto beans ca
3 | EUROPE | ly final courts cajole furiously final excuse
4 | MIDDLE EAST | uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl
(5 rows)
Query 20260203_062635_02081_cuz7g, FINISHED, 2 nodes
https://xxxx.xxxx.lakehouse.ibmappdomain.cloud:33333/ui/query.html?20260203_062635_02081_cuz7g
Splits: 20 total, 20 done (100.00%)
CPU Time: 0.0s total, 2.5K rows/s, 0B/s, 100% active
Per Node: 0.0 parallelism, 3 rows/s, 0B/s
Parallelism: 0.0
Peak User Memory: 0B
Peak Total Memory: 625B
Peak Task Total Memory: 625B
S0-createTime: sum=1.77T count=1 min=1.77T max=1.77T
S0-driverCountPerTask: sum=16 count=1 min=16 max=16
S0-endTime: sum=1.77T count=1 min=1.77T max=1.77T
(50行省略)
validateFinalPlanTimeNanos: sum=0ms count=1 min=0ms max=0ms
validateFinalPlanTimeNanosOnCpu: sum=0ms count=1 min=0ms max=0ms
validateIntermediatePlanTimeNanos: sum=0ms count=1 min=0ms max=0ms
validateIntermediatePlanTimeNanosOnCpu: sum=0ms count=1 min=0ms max=0ms
[Latency: client-side: 0:02, server-side: 0:01] [5 rows, 0B] [6 rows/s, 0B/s]
presto>
(注意) ファイル読み込み('-f')でのバッチモードでは 実行統計情報は出力されません。
$ presto --debug --runtime-stats -f select_region.sql --server https://xxxx.xxxx.lakehouse.ibmappdomain.cloud:33333 --user ibmlhapikey_userid --access-token $TOKEN
"0","AFRICA","lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to "
"1","AMERICA","hs use ironic, even requests. s"
"2","ASIA","ges. thinly even pinto beans ca"
"3","EUROPE","ly final courts cajole furiously final excuse"
"4","MIDDLE EAST","uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl"
$
3. 対話モードで実行してファイルに出力する。
私がログに保管するときに使用しているのが、対話モードで実行させてファイルに出力させることです。
(問題点) バッチモードで不便なのは実行Queryが出力に残らないこと。
バッチモードで不便なのは実行Queryが出力に残らないことです。例えば、複数のQueryをファイルに保管しておきます。
$ cat select_region_nation.sql
select * from tpch.tiny.region limit 3;
select * from tpch.tiny.nation limit 3;
$
実行結果が次の通りで実行Queryが出力されていません。(方法があればぜひ教えてください)
$ presto -f select_region_nation.sql --server https://xxxx.xxxx.lakehouse.ibmappdomain.cloud:33333 --user ibmlhapikey_userid --access-token $TOKEN
"0","AFRICA","lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to "
"1","AMERICA","hs use ironic, even requests. s"
"2","ASIA","ges. thinly even pinto beans ca"
"0","ALGERIA","0"," haggle. carefully final deposits detect slyly agai"
"1","ARGENTINA","1","al foxes promise slyly according to the regular accounts. bold requests alon"
"2","BRAZIL","1","y alongside of the pending deposits. carefully special packages are about the ironic forges. slyly special "
$
(改善案) 対話モードで実行してファイルに出力する。
ちょっと面倒なのですが 次のようにして実行すると
presto --server https://xxxx.xxxx.lakehouse.ibmappdomain.cloud:33333 --user ibmlhapikey_userid --access-token $TOKEN <<< "
select * from tpch.tiny.region limit 3;
select * from tpch.tiny.nation limit 3;
" > /tmp/a
出力に実行Queryがのこります。
$ cat /tmp/a
presto>
presto> select * from tpch.tiny.region limit 3;
r_regionkey | r_name | r_comment
-------------+---------+---------------------------------------------------------------------------------------------------------------------
0 | AFRICA | lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to
1 | AMERICA | hs use ironic, even requests. s
2 | ASIA | ges. thinly even pinto beans ca
(3 rows)
Query 20260203_065811_02254_cuz7g, FINISHED, 2 nodes
Splits: 21 total, 21 done (100.00%)
[Latency: client-side: 0:02, server-side: 0:01] [5 rows, 0B] [7 rows/s, 0B/s]
presto> select * from tpch.tiny.nation limit 3;
n_nationkey | n_name | n_regionkey | n_comment
-------------+-----------+-------------+-------------------------------------------------------------------------------------------------------------
0 | ALGERIA | 0 | haggle. carefully final deposits detect slyly agai
1 | ARGENTINA | 1 | al foxes promise slyly according to the regular accounts. bold requests alon
2 | BRAZIL | 1 | y alongside of the pending deposits. carefully special packages are about the ironic forges. slyly special
(3 rows)
Query 20260203_065815_02256_cuz7g, FINISHED, 2 nodes
Splits: 21 total, 21 done (100.00%)
[Latency: client-side: 0:02, server-side: 0:01] [25 rows, 0B] [41 rows/s, 0B/s]
presto>
presto>
$