Redshiftのsuper型配列を反復処理する方法

Posted at 2022-06-01

はじめに

Redshiftのsuperデータ型は、配列などを簡単に扱うことができ、とても便利である。
今回は、superデータ型の配列を反復処理する方法を簡潔にまとめてみた。

参考ドキュメント

テスト用テーブル準備

今回の検証のために、以下のようなカラムを持つテンポラリテーブルを作成した。

create temporary table tmp_test_super(
username varchar(8),
hobby super
);

次に以下の値を持つ、jsonファイルを作成し、作成したテーブルにインポートした。

{"username":"Mac","hobby":["baseball","soccer","tennis"]}
{"username":"John","hobby":["baseball","cooking"]}
{"username":"Miki","hobby":["cooking","reading"]}
{"username":"Andy","hobby":["baseball","soccer"]}

インポートした結果は次の通りである。

=> select * from tmp_test_super;

 username |             hobby
----------+--------------------------------
 Mac      | ["baseball","soccer","tennis"]
 John     | ["baseball","cooking"]
 Miki     | ["cooking","reading"]
 Andy     | ["baseball","soccer"]
(4 rows)

このテーブルを用いて検証する。

検証

この項目では実際にクエリ例を紹介しながら、反復処理を行う。

単純に要素を1つずつ取り出す

=> select a.*, b from tmp_test_super a, a.hobby b;

 username |             hobby              |     b
----------+--------------------------------+------------
 Mac      | ["baseball","soccer","tennis"] | "baseball"
 Mac      | ["baseball","soccer","tennis"] | "soccer"
 Mac      | ["baseball","soccer","tennis"] | "tennis"
 John     | ["baseball","cooking"]         | "baseball"
 John     | ["baseball","cooking"]         | "cooking"
 Miki     | ["cooking","reading"]          | "cooking"
 Miki     | ["cooking","reading"]          | "reading"
 Andy     | ["baseball","soccer"]          | "baseball"
 Andy     | ["baseball","soccer"]          | "soccer"
(9 rows)

このように、FROM句内に反復したい属性値を記述することで、反復処理が可能となる。

次のように、asを記述しても良い。

=> select a.*, b from tmp_test_super as a, a.hobby as b;

配列の要素ごとにレコード数を数える

=> select b as a_hobby, count(1) from tmp_test_super a, a.hobby b group by a_hobby order by count desc;

  a_hobby   | count
------------+-------
 "baseball" |     3
 "cooking"  |     2
 "soccer"   |     2
 "reading"  |     1
 "tennis"   |     1
(5 rows)

配列の要素をWHERE句で条件指定する

=> select a.*, b as a_hobby from tmp_test_super a, a.hobby b where b = 'baseball';

 username |             hobby              |  a_hobby
----------+--------------------------------+------------
 Mac      | ["baseball","soccer","tennis"] | "baseball"
 John     | ["baseball","cooking"]         | "baseball"
 Andy     | ["baseball","soccer"]          | "baseball"
(3 rows)

条件指定する際は、'"baseball"'とするのではなく、'baseball'とすればよい

まとめ

super型配列を反復処理するためには、FROM句に記述する必要があることがわかった。
JSONなどをそのままRedshiftで扱いたいときは、配列のまま簡単に扱うことができるsuper型は非常に便利である。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up