LoginSignup
10
9

More than 1 year has passed since last update.

BigQuery で無からリレーションを出現させる

Last updated at Posted at 2015-07-29

まず

select * from 
(select 1 as n1, 2 as n2, 3 as n3, 4 as n4, 5 as n5, 6 as n6, 7 as n7, 8 as n8, 9 as n9, 10 as n10, 11 as n11, 12 as n12, 13 as n13, 14 as n14, 15 as n15, 16 as n16, 17 as n17, 18 as n18, 19 as n19, 20 as n20, 21 as n21, 22 as n22, 23 as n23, 24 as n24, 25 as n25, 26 as n26, 27 as n27, 28 as n28, 29 as n29, 30 as n30, 31 as n31, 32 as n32, 33 as n33, 34 as n34, 35 as n35, 36 as n36, 37 as n37, 38 as n38, 39 as n39, 40 as n40, 41 as n41, 42 as n42, 43 as n43, 44 as n44, 45 as n45, 46 as n46, 47 as n47, 48 as n48, 49 as n49, 50 as n50, 51 as n51, 52 as n52, 53 as n53, 54 as n54, 55 as n55, 56 as n56, 57 as n57, 58 as n58, 59 as n59, 60 as n60, 61 as n61, 62 as n62, 63 as n63, 64 as n64, 65 as n65, 66 as n66, 67 as n67, 68 as n68, 69 as n69, 70 as n70, 71 as n71, 72 as n72, 73 as n73, 74 as n74, 75 as n75, 76 as n76, 77 as n77, 78 as n78, 79 as n79, 80 as n80, 81 as n81, 82 as n82, 83 as n83, 84 as n84, 85 as n85, 86 as n86, 87 as n87, 88 as n88, 89 as n89, 90 as n90, 91 as n91, 92 as n92, 93 as n93, 94 as n94, 95 as n95, 96 as n96, 97 as n97, 98 as n98, 99 as n99, 100 as n100),
-- 343回繰り返し
(select 1 as n1, 2 as n2, 3 as n3, 4 as n4, 5 as n5, 6 as n6, 7 as n7, 8 as n8, 9 as n9, 10 as n10, 11 as n11, 12 as n12, 13 as n13, 14 as n14, 15 as n15, 16 as n16, 17 as n17, 18 as n18, 19 as n19, 20 as n20, 21 as n21, 22 as n22, 23 as n23, 24 as n24, 25 as n25, 26 as n26, 27 as n27, 28 as n28, 29 as n29, 30 as n30, 31 as n31, 32 as n32, 33 as n33, 34 as n34, 35 as n35, 36 as n36, 37 as n37, 38 as n38, 39 as n39, 40 as n40, 41 as n41, 42 as n42, 43 as n43, 44 as n44, 45 as n45, 46 as n46, 47 as n47, 48 as n48, 49 as n49, 50 as n50, 51 as n51, 52 as n52, 53 as n53, 54 as n54, 55 as n55, 56 as n56, 57 as n57, 58 as n58, 59 as n59, 60 as n60, 61 as n61, 62 as n62, 63 as n63, 64 as n64, 65 as n65, 66 as n66, 67 as n67, 68 as n68, 69 as n69, 70 as n70, 71 as n71, 72 as n72, 73 as n73, 74 as n74, 75 as n75, 76 as n76, 77 as n77, 78 as n78, 79 as n79, 80 as n80, 81 as n81, 82 as n82, 83 as n83, 84 as n84, 85 as n85, 86 as n86, 87 as n87, 88 as n88, 89 as n89, 90 as n90, 91 as n91, 92 as n92, 93 as n93, 94 as n94, 95 as n95, 96 as n96, 97 as n97, 98 as n98, 99 as n99, 100 as n100)

上記のような SQL を実行すると、実テーブルからの読み込み量が 0B で、下記のような結果が出てきます。

Google_BigQuery.png

テーブルへの保存

Query の結果が出るのであれば、Destination Table 設定によって新しいテーブルに保存(Write, OverWrite)したり、既存のテーブルに追記(Append)することが出来ると分かると思います。
select * from a, b, c のような書き方は標準 SQL では cross join ですが、BigQuery では UNION ALL として動きます。
BigQuery では、1つの FROM 節につき、1,000 個まで UNION ALL することが出来ます。ただし、1,000 件 UNION ALL する SQL をサブクエリーにして、さらに UNION ALL でネストすることは可能なので、理論上は SQL サイズ制限(後述)に当たらない限り増やせます。

使いどころ

JOIN のタネとして

例えば PostgreSQL で分析するときに

select *
  from hoges as h
  join (values(1, 'taro'), (2, 'jiro'), (3, 'saburo')) as target_users(id, name) on h.id = target_users.id

とか

    select year::text || '-' || lpad(month::text, 2, '0')
      from (select year from generate_series(2014, 2015) as years(year)) as years
cross join (select month from generate_series(1, 12) as months(month)) as months;

のような、ジェネレーターによるリレーションを使ったクエリーを書く場合があると思います。
BigQuery には values も generate_series も無いですが、今回の方法で頑張れば、なんとか代用になるはずです

streaming inserts 課金回避チートとして

アプリ側で 1,000 件ずつバッファして、溜まるか一定時間経ったら select 経由で既存テーブルに Append するというような方法で、疑似 streaming inserts が実現できそうです(やってない)。
ただ、streaming inserts と比べるとスループットが段違いに遅い(今回の例で40秒くらいかかる)ので、速度が重要じゃなく、料金を節約したいなどの場合のみ有効という事になると思います。
また、更に遅くてもいいなら、素直に GCS 経由で load をしていけばいいだけなので、単に「できなくない」レベルの小技と言えるでしょう。
後述する「注意点」にあるような制限に当たる可能性も高いので、特にオススメはしません。
※ 一応ドキュメントを通して読んで、禁止はされていないという認識ですが「これ、マズイだろ」という情報をお持ちの方がいらっしゃいましたら、お教え頂けると助かります。

注意点

BigQuery の制限に当たらないように運用する必要があります。

1日 20,000 クエリー制限

Daily limit (shared between interactive and batch queries): 20,000 queries. Queries that return cached results do count against this limit.

あんまりバカスカ使ってると、この上限に当たってしまって、運用が止まってしまうので気をつけましょう

1テーブルにつき、1日 1,000 更新まで

Daily update limit: 1,000 updates per table per day; applies only to the destination table in a query.

streaming inserts の代わりに Append しまくってると、この上限にあたる恐れがあります
幸いテーブル毎の制限なので、適宜シャーディングしつつ、定期的にバッチでまとめてやるなどの対応が有効と思われます

1クエリーにつき SQL は 256KB くらいまで

Maximum query length: 256 KB

今回の例では 345 行を出力していますが、346 行以上になると下記のようなエラーが出ます

Exceeded memory limit during query parsing. Aborting the parse at: 347:1052

SQL のサイズは 367KB くらいだったので、(圧縮後)256KB と言っている制限とも合致しますね

10
9
1

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
10
9