cloud shell又はCloud SDK組込済みMacターミナルから実行した場合のコマンドです。
1.全般
有効なアカウント名一覧の表示
$ gcloud auth list
Credentialed Accounts
ACTIVE ACCOUNT
* xxxxxxx@gmail.com
To set the active account, run:
$ gcloud config set account `ACCOUNT`
プロジェクトIDの一覧表示
$ gcloud config list project
[core]
project = xxxproject-xxxxx
Your active configuration is: [default]
localからCloud Shellへの接続
#初回はパスフレーズ設定する
$ gcloud beta cloud-shell ssh
Automatic authentication with GCP CLI tools in Cloud Shell is disabled. To enable, please rerun command with `--authorize-session` flag.
WARNING: The private SSH key file for gcloud does not exist.
WARNING: The public SSH key file for gcloud does not exist.
WARNING: You do not have an SSH key for gcloud.
WARNING: SSH keygen will be executed to generate a key.
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /Users/xxx/.ssh/google_compute_engine.
Your public key has been saved in /Users/xxx/.ssh/google_compute_engine.pub.
#2回目以降はパスフレーズ入力で接続可能
$ gcloud beta cloud-shell ssh
Automatic authentication with GCP CLI tools in Cloud Shell is disabled. To enable, please rerun command with `--authorize-session` flag.
Enter passphrase for key '/Users/xxx/.ssh/google_compute_engine':
2.BigQuery
queryコマンドのhelp一覧表示
# bq全コマンドの一覧表示はbq helpで表示可能
$ bq help query
表示結果
Python script for interacting with BigQuery.
USAGE: bq.py [--global_flags] <command> [--command_flags] [args]
query Execute a query.
Query should be specified on command line, or passed on stdin.
Examples:
bq query 'select count(*) from publicdata:samples.shakespeare'
echo 'select count(*) from publicdata:samples.shakespeare' | bq query
Usage:
query [<sql_query>]
Flags for query:
/Users/xxxx/gcp/cloud-sdk/google-cloud-sdk/platform/bq/bq.py:
--[no]allow_large_results: Enables larger destination table sizes for legacy SQL queries.
--[no]append_table: When a destination table is specified, whether or not to append.
(default: 'false')
--[no]batch: Whether to run the query in batch mode.
(default: 'false')
--clustering_fields: Comma separated field names. Can only be specified with time based
partitioning. Data will be first partitioned and subsequently "clustered on these
fields.
--destination_kms_key: Cloud KMS key for encryption of the destination table data.
--destination_schema: Schema for the destination table. Either a filename or a comma-
separated list of fields in the form name[:type].
(default: '')
--destination_table: Name of destination table for query results.
(default: '')
--display_name: Display name for the created scheduled query configuration.
(default: '')
--[no]dry_run: Whether the query should be validated without executing.
--external_table_definition: Specifies a table name and either an inline table definition
or a path to a file containing a JSON table definition to use in the query. The format
is "table_name::path_to_file_with_json_def" or
"table_name::schema@format=uri@connection". Note using connection is an experimental
feature and is still under development.For example, "--
external_table_definition=Example::/tmp/example_table_def.txt" will define a table
named "Example" using the URIs and schema encoded in example_table_def.txt.;
repeat this option to specify a list of values
--[no]flatten_results: Whether to flatten nested and repeated fields in the result schema
for legacy SQL queries. If not set, the default behavior is to flatten.
--job_timeout_ms: Maximum time to run the entire script.
--label: A label to set on a query job. The format is "key:value";
repeat this option to specify a list of values
--max_child_jobs: Maximum number of child jobs to fetch results from after executing a
script. If the number of child jobs exceeds this limit, only the final result will be
displayed.
(default: '1000')
(an integer)
-n,--max_rows: How many rows to return in the result.
(default: '100')
(an integer)
--max_statement_results: Maximum number of script statements to display the results for.
(default: '100')
(an integer)
--maximum_billing_tier: The upper limit of billing tier for the query.
(an integer)
--maximum_bytes_billed: The upper limit of bytes billed for the query.
(an integer)
--min_completion_ratio: [Experimental] The minimum fraction of data that must be scanned
before a query returns. If not set, the default server value (1.0) will be used.
(a number in the range [0, 1.0])
--parameter: Either a file containing a JSON list of query parameters, or a query
parameter in the form "name:type:value". An empty name produces a positional parameter.
The type may be omitted to assume STRING: name::value or ::value. The value "NULL"
produces a null value.;
repeat this option to specify a list of values
--range_partitioning: Enables range partitioning on the table. The format should be
"field,start,end,interval". The table will be partitioned based on the value of the
field. Field must be a top-level, non-repeated INT64 field. Start, end, and interval
are INT64 values defining the ranges.
--[no]replace: If true, erase existing contents before loading new data.
(default: 'false')
--request_id: The request_id to use for the jobs.query request. Only valid when used in
combination with --rpc.
--[no]require_cache: Whether to only run the query if it is already cached.
--[no]require_partition_filter: Whether to require partition filter for queries over this
table. Only apply to partitioned table.
--[no]rpc: If true, use rpc-style query API instead of jobs.insert().
(default: 'false')
--schedule: Scheduled query schedule. If non-empty, this query requests could create a
scheduled query understand the customer project. See
https://cloud.google.com/appengine/docs/flexible/python/scheduling-jobs-with-cron-
yaml#the_schedule_format for the schedule format
--schema_update_option: Can be specified when append to a table, or replace a table
partition. When specified, the schema of the destination table will be updated with the
schema of the new data. One or more of the following options can be specified:
ALLOW_FIELD_ADDITION: allow new fields to be added
ALLOW_FIELD_RELAXATION: allow relaxing required fields to nullable;
repeat this option to specify a list of values
--script_statement_byte_budget: Maximum bytes that can be billed for any statement in a
script.
--script_statement_timeout_ms: Maximum time to complete each statement in a script.
-s,--start_row: First row to return in the result.
(default: '0')
(an integer)
--target_dataset: Target dataset used to create scheduled query.
--time_partitioning_expiration: Enables time based partitioning on the table and sets the
number of seconds for which to keep the storage for the partitions in the table. The
storage in a partition will have an expiration time of its partition time plus this
value.
(an integer)
--time_partitioning_field: Enables time based partitioning on the table and the table
will be partitioned based on the value of this field. If time based partitioning is
enabled without this value, the table will be partitioned based on the loading time.
--time_partitioning_type: Enables time based partitioning on the table and set the type.
The default value is DAY, which will generate one partition per day. Other supported
values are HOUR, MONTH, and YEAR.
--udf_resource: The URI or local filesystem path of a code file to load and evaluate
immediately as a User-Defined Function resource.;
repeat this option to specify a list of values
--[no]use_cache: Whether to use the query cache to avoid rerunning cached queries.
--[no]use_legacy_sql: Whether to use Legacy SQL for the query. If not set, the default
value is true.
absl.flags:
--flagfile: Insert flag definitions from the given file into the command line.
(default: '')
--undefok: comma-separated list of flag names that it is okay to specify on the command
line even if the program does not define a flag with that name. IMPORTANT: flags in
this list that have arguments MUST use the --flag=value format.
(default: '')
Run 'bq.py --help' to get help for global flags.
Run 'bq.py help' to see the list of available commands.
データセットを作成する
$ bq mk [データセット名]
Dataset 'xxxproject-xxxx:[データセット名]' successfully created.
#リージョン指定(これをしないとデフォルトリージョンがUSになる)
$ bq mk -d --data_location=asia-northeast1 [データセット名]
Dataset 'xxxproject-xxxx:[データセット名]' successfully created.
データセットを確認する
$ bq ls
datasetId
------------
[データセット名]
#特定のプロジェクト内のデータセットを確認する場合は以下
$ bq ls xxxproject-xxxx:
テーブルの作成または更新とデータの読み込み
# コマンドの引数説明
# datasetID: [データセット名]
# tableID: [テーブル名]
# source: [データファイル名xxx.txt]
# schema: col1:string,col2:string,col3:string
#テーブル作成
$ bq load [データセット名].[テーブル名] xxx.txt \
> col1:string,col2:string,col3:string
Upload complete.
Waiting on bqjob_xxxx ... (0s) Current status: DONE
テーブルのスキーマを調べる
$ bq show [データセット名].[テーブル名]
Table [プロジェクト名]:[データセット名].[テーブル名]
Last modified Schema Type Total URIs Expiration Labels
----------------- --------------------- ---------- ------------ ------------ --------
28 Dec 11:39:03 |- col1: string EXTERNAL 1
|- col2: string
|- col3: string
SQLを実行する
$ bq query --use_legacy_sql=false \
> 'select
> col1,
> col2,
> col3
> from
> [データセット名].[テーブル名]
> '
Waiting on bqjob_xxxx ... (0s) Current status: DONE
+--------+--------+-----------+
| col1 | col2 | col3 |
+--------+--------+-----------+
| 001 | 000001 | 123456789 |
| 001 | 000002 | 234567890 |
| 003 | 000001 | 145678908 |
| 004 | 000002 | 123456789 |
| 004 | 000010 | 098765432 |
| 005 | 000010 | 000000001 |
| 007 | 000100 | 000000010 |
| 010 | 123456 | 00000009 |
| 011 | 123456 | 111111111 |
| 100 | 234577 | 234567899 |
+--------+--------+-----------+