More than 5 years have passed since last update.

BigQuery入門2

BigQuery

Posted at 2019-11-21

概要

BigQuery入門の続き

今回はコマンドラインから動かしてみます。
前回同様スタートガイドをなぞります。

Cloud SDKをインストールして初期化

まずはドキュメントを見ながらClouud SDKのインストール

pythonのバージョン確認

~$ python -V
Python 2.7.16

パッケージをダウンロードして展開

Downloads$ ./google-cloud-sdk/install.sh
Welcome to the Google Cloud SDK!

To help improve the quality of this product, we collect anonymized usage data
and anonymized stacktraces when crashes are encountered; additional information
is available at <https://cloud.google.com/sdk/usage-statistics>. You may choose
to opt out of this collection now (by choosing 'N' at the below prompt), or at
any time in the future by running the following command:

    gcloud config set disable_usage_reporting true

Do you want to help improve the Google Cloud SDK (Y/n)?  Y


Your current Cloud SDK version is: 245.0.0
The latest available version is: 272.0.0

┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                   Components                                                  │
├──────────────────┬──────────────────────────────────────────────────────┬──────────────────────────┬──────────┤
│      Status      │                         Name                         │            ID            │   Size   │
├──────────────────┼──────────────────────────────────────────────────────┼──────────────────────────┼──────────┤
│ Update Available │ BigQuery Command Line Tool                           │ bq                       │  < 1 MiB │
│ Update Available │ Cloud SDK Core Libraries                             │ core                     │ 12.5 MiB │
│ Update Available │ Cloud Storage Command Line Tool                      │ gsutil                   │  3.6 MiB │
│ Not Installed    │ App Engine Go Extensions                             │ app-engine-go            │  4.8 MiB │
│ Not Installed    │ Appctl                                               │ appctl                   │ 18.6 MiB │
│ Not Installed    │ Cloud Bigtable Command Line Tool                     │ cbt                      │  7.3 MiB │
│ Not Installed    │ Cloud Bigtable Emulator                              │ bigtable                 │  6.6 MiB │
│ Not Installed    │ Cloud Datalab Command Line Tool                      │ datalab                  │  < 1 MiB │
│ Not Installed    │ Cloud Datastore Emulator                             │ cloud-datastore-emulator │ 18.4 MiB │
│ Not Installed    │ Cloud Firestore Emulator                             │ cloud-firestore-emulator │ 40.0 MiB │
│ Not Installed    │ Cloud Pub/Sub Emulator                               │ pubsub-emulator          │ 34.9 MiB │
│ Not Installed    │ Cloud SQL Proxy                                      │ cloud_sql_proxy          │  3.7 MiB │
│ Not Installed    │ Emulator Reverse Proxy                               │ emulator-reverse-proxy   │ 14.5 MiB │
│ Not Installed    │ Google Cloud Build Local Builder                     │ cloud-build-local        │  5.9 MiB │
│ Not Installed    │ Google Container Registry's Docker credential helper │ docker-credential-gcr    │  1.8 MiB │
│ Not Installed    │ Skaffold                                             │ skaffold                 │ 44.0 MiB │
│ Not Installed    │ gcloud Alpha Commands                                │ alpha                    │  < 1 MiB │
│ Not Installed    │ gcloud Beta Commands                                 │ beta                     │  < 1 MiB │
│ Not Installed    │ gcloud app Java Extensions                           │ app-engine-java          │ 62.0 MiB │
│ Not Installed    │ gcloud app PHP Extensions                            │ app-engine-php           │ 21.9 MiB │
│ Not Installed    │ gcloud app Python Extensions                         │ app-engine-python        │  6.0 MiB │
│ Not Installed    │ gcloud app Python Extensions (Extra Libraries)       │ app-engine-python-extras │ 27.1 MiB │
│ Not Installed    │ kubectl                                              │ kubectl                  │  < 1 MiB │
└──────────────────┴──────────────────────────────────────────────────────┴──────────────────────────┴──────────┘
To install or remove components at your current SDK version [245.0.0], run:
  $ gcloud components install COMPONENT_ID
  $ gcloud components remove COMPONENT_ID

To update your SDK installation to the latest version [272.0.0], run:
  $ gcloud components update



To take a quick anonymous survey, run:
  $ gcloud alpha survey


Modify profile to update your $PATH and enable shell command
completion?

Do you want to continue (Y/n)?  Y

The Google Cloud SDK installer will now prompt you to update an rc
file to bring the Google Cloud CLIs into your environment.

Enter a path to an rc file to update, or leave blank to use
[/Users/xxxxx/.bash_profile]:
Backing up [/Users/xxxxx/.bash_profile] to [/Users/xxxxx/.bash_profile.backup].
[/Users/xxxxx/.bash_profile] has been updated.

==> Start a new shell for the changes to take effect.


For more information on how to get started, please visit:
  https://cloud.google.com/sdk/docs/quickstarts

gcloud initでSDKを初期化

使用するプロジェクトや基本的な設定が対話的に行われます。

Downloads$ ./google-cloud-sdk/bin/gcloud init
Welcome! This command will take you through the configuration of gcloud.

Your current configuration has been set to: [default]

You can skip diagnostics next time by using the following flag:
  gcloud init --skip-diagnostics

Network diagnostic detects and fixes local network connection issues.
Checking network connection...done.
Reachability Check passed.
Network diagnostic passed (1/1 checks passed).

You must log in to continue. Would you like to log in (Y/n)?  Y

Your browser has been opened to visit:

    oauthのlink




Updates are available for some Cloud SDK components.  To install them,
please run:
  $ gcloud components update

You are logged in as: [xxxxxxxx@gmail.com].

Pick cloud project to use:
 [1] scenic-rampart-xxxxxx
 [2] sinuous-cat-xxxxxx
 [3] Create a new project
Please enter numeric choice or text value (must exactly match list
item):
Please enter a value between 1 and 3, or a value present in the list:
Please enter a value between 1 and 3, or a value present in the list:  1

Your current project has been set to: [scenic-rampart-xxxxxx].

Not setting default zone/region (this feature makes it easier to use
[gcloud compute] by setting an appropriate default value for the
--zone and --region flag).
See https://cloud.google.com/compute/docs/gcloud-compute section on how to set
default compute region and zone manually. If you would like [gcloud init] to be
able to do this for you the next time you run it, make sure the
Compute Engine API is enabled for your project on the
https://console.developers.google.com/apis page.

Created a default .boto configuration file at [/Users/xxxxx/.boto]. See this file and
[https://cloud.google.com/storage/docs/gsutil/commands/config] for more
information about configuring Google Cloud Storage.
Your Google Cloud SDK is configured and ready to use!

* Commands that require authentication will use xxxxxxxx@gmail.com by default
* Commands will reference project `scenic-rampart-xxxxxx` by default
Run `gcloud help config` to learn how to change individual settings

This gcloud configuration is called [default]. You can create additional configurations if you work with multiple accounts and/or projects.
Run `gcloud topic configurations` to learn more.

Some things to try next:

* Run `gcloud --help` to see the Cloud Platform services you can interact with. And run `gcloud help COMMAND` to get help on any gcloud command.
* Run `gcloud topic --help` to learn about advanced features of the SDK like arg files and output formatting

テーブルを確認

簡単にコマンドツールの設定ができたようなので、テーブルをみてみましょう。

Downloads$ bq show bigquery-public-data:samples.shakespeare
-bash: bq: command not found

おや…?bash_profileの読み直しができてませんね。

あらためて

~$ bq show bigquery-public-data:samples.shakespeare

Welcome to BigQuery! This script will walk you through the
process of initializing your .bigqueryrc configuration file.

First, we need to set up your credentials if they do not
already exist.

Credential creation complete. Now we will select a default project.

List of projects:
  #         projectId           friendlyName
 --- ----------------------- ------------------
  1   scenic-rampart-xxxxxx   My First Project
Found only one project, setting scenic-rampart-xxxxxx as the default.

BigQuery configuration complete! Type "bq" to get started.

Table bigquery-public-data:samples.shakespeare

   Last modified                  Schema                 Total Rows   Total Bytes   Expiration   Time Partitioning   Labels
 ----------------- ------------------------------------ ------------ ------------- ------------ ------------------- --------
  15 Mar 02:16:45   |- word: string (required)           164656       6432064
                    |- word_count: integer (required)
                    |- corpus: string (required)
                    |- corpus_date: integer (required)

クエリの実行

ガイドに従ってクエリを実行してみます。

サンプルクエリを実行

シェイクスピアの指定文字列による単語の出現回数抽出ですね。

~$ bq query --use_legacy_sql=false \
> 'SELECT
>    word,
>    SUM(word_count) AS count
>  FROM
>    `bigquery-public-data`.samples.shakespeare
>  WHERE
>    word LIKE "%raisin%"
>  GROUP BY
>    word'
Waiting on bqjob_r7585f29df8d53688_0000016e879b2984_1 ... (0s) Current status: DONE
+---------------+-------+
|     word      | count |
+---------------+-------+
| praising      |     8 |
| Praising      |     4 |
| raising       |     5 |
| dispraising   |     2 |
| dispraisingly |     1 |
| raisins       |     1 |
+---------------+-------+

該当がないケースのパターン

こちらは該当がない指定をしたケース

~$ bq query --use_legacy_sql=false \
> 'SELECT
>   word
> FROM
>   `bigquery-public-data`.samples.shakespeare
> WHERE
>   word = "huzzah"'
Waiting on bqjob_r5cb548f76e6eb3d1_0000016e879bb6a6_1 ... (0s) Current status: DONE

新しいテーブルを作成

GUIと同様の流れで新しいテーブルを作成します。

~$ bq mk babynames
Dataset 'scenic-rampart-xxxxxx:babynames' successfully created.

リストでも確認できます。

~$ bq ls
  datasetId
 -----------
  babynames

前回ダウンロードしておいた赤ちゃん名前に関するデータを再利用してロードします。

names$ bq load babynames.names2010 yob2010.txt name:string,gender:string,count:integer
Upload complete.
Waiting on bqjob_r30af566ed4babcba_0000016e87edc77d_1 ... (0s) Current status: DONE

テーブルができていることが確認できます。

names$ bq ls babynames
   tableId    Type    Labels   Time Partitioning
 ----------- ------- -------- -------------------
  names2010   TABLE

テーブルの詳細を見てみます。

names$ bq show babynames.names2010
Table scenic-rampart-xxxxxx:babynames.names2010

   Last modified         Schema         Total Rows   Total Bytes   Expiration   Time Partitioning   Labels
 ----------------- ------------------- ------------ ------------- ------------ ------------------- --------
  20 Nov 17:29:02   |- name: string     34073        654482
                    |- gender: string
                    |- count: integer

作成したテーブルに対してクエリ実行

簡単なクエリで確認

names$ bq query "SELECT name,count FROM babynames.names2010 WHERE gender = 'F' ORDER BY count DESC LIMIT 5"
Waiting on bqjob_r68bdc0a63b2a3001_0000016e87fadae6_1 ... (1s) Current status: DONE
+----------+-------+
|   name   | count |
+----------+-------+
| Isabella | 22913 |
| Sophia   | 20643 |
| Emma     | 17345 |
| Olivia   | 17028 |
| Ava      | 15433 |
+----------+———+

names$ bq query "SELECT name,count FROM babynames.names2010 WHERE gender = 'M' ORDER BY count ASC LIMIT 5"
Waiting on bqjob_r54f108e005e2f5a3_0000016e87fb4684_1 ... (0s) Current status: DONE
+----------+-------+
|   name   | count |
+----------+-------+
| Aaqib    |     5 |
| Aaidan   |     5 |
| Aadhavan |     5 |
| Aarian   |     5 |
| Aamarion |     5 |
+----------+———+

簡単に実行できますね。

最後にお片付け

コマンドからの削除も簡単です。

names$ bq rm -r babynames
rm: remove dataset 'scenic-rampart-xxxxxx:babynames'? (y/N) y
names$ bq ls
names$

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up