More than 5 years have passed since last update.

GCPのcredentialを明示的に扱う

Last updated at 2019-09-06Posted at 2019-01-28

PythonでGCPを触る時に、
credentialのパスを環境変数ではないところから宣言するという内容です。

はじめに

最近はもう、AWSやAzureよりもGCPの方が使うことが多くなってきて、すっかり染まってしまっています。

GCPを使う理由は色々とあると思うのですが、私が思うに、

k8s
分析環境
マルチリージョン対応のものが色々ある

とかが思いつくところかなと思います。まあ、大元のコンピューティングサーバーやなんかは、正直なんでも良いかなあと思ったりもします。TPUとか使うならGCPになるでしょう。TensorFlowをメインで使う場合はGCPを選ぶかもしれませんね。

今回は、noteで書いてるBigQuery入門的なエントリーで実装をしている時に、調べたことを残しておきます。

たっせいしたかったこと

GCP上でプロジェクトを複数持ってしまっている時に、外部のスクリプトから叩くAPIの認証キーをうまいこと設定しておきたいな

というのが今回の思いです。

しらべたこと

GCPを使ったことがある人はわかると思いますが、GCPにはgcloudコマンドという奴があって、何かと言えば、GoogleCloudPlatformでインスタンスを作ったりするコマンドラインインターフェース（CLI）です。
要は、ローカルからクラウドのインスタンスをチョチョっと立てたりするのに便利なツールです。もちろん、GCP上のCloud Shellでも叩けます。

まず、どう考えてもgcloudのコマンドでなんかしら操作できるやろと思い立ち調べました。
だって、gcloudがクラウドにアクセスするのに、認証などの設定がないわけないし。

gcloud project 複数とかhow to manage multi projects on gcpとか調べました。

すると、いくつか参考になりそうな記事が出てきます
https://blog.engineer.adways.net/entry/2018/06/08/150000
https://qiita.com/sky0621/items/597d4de7ed9ba7e31f6d

でも、あれ、面倒だな。。。
というか、インスタンス立てるときくらいしかgcloudコマンド使わないな。。。
と思い立ち、Python版のSDKでやり方探すことにしました。

こういう時は、公式のドキュメントを読みまくれば普通に解決するし、正確な情報が手に入るのでそれが一番早い。
https://googleapis.github.io/google-cloud-python/latest/index.html

この時は、BigQueryのClientを触っていたので、Argumentを確認すると、
https://googleapis.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.client.Client.html#google.cloud.bigquery.client.Client
ちゃんと書いてありますね。

以下で良いようです。

from google.oauth2.service_account import Credentials

credentials = Credentials.from_service_account_file(filename='credential_path')

最終的に行き着いたのが、こういう感じ。
ここでは、BigQueryにCloudStorageUriからcsvデータをバルクロードするジョブを書いています。

import os
from typing import Union

from google.cloud import bigquery
from google.oauth2.service_account import Credentials


def load_from_cloud_storage_uri(source_uri: str,
                                project: str,
                                dataset: str,
                                table: str,
                                credential: Union[str,
                                                  os.PathLike],
                                auto_detect: bool = True,
                                skip_leading_rows: int = 1) -> bigquery.LoadJob:
    """
    This function is only focused on load job execution via Google Cloud Storage.
    We're supporting only csv import from script.

    Args:
        source_uri (str) : cloud storage uri
        project (str) : project_id on google cloud platform
        dataset (str) : target dataset_id
        table (str) : table name created by job
        credential (str, os.PathLike) : access key location
        auto_detect (bool) : whether you use schema auto detection or not
        skip_leading_rows (int) : how many rows are skipped.

    Returns:
        bigquery.LoadJob : job result
    """

    # setup credential and client.
    credentials = Credentials.from_service_account_file(filename=credential)
    client = bigquery.Client(project=project, credentials=credentials)
    dataset_ref = client.dataset(dataset_id=dataset)

    # edit configuration
    # TODO: make more editable.
    job_config = bigquery.LoadJobConfig()
    job_config.autodetect = auto_detect
    job_config.skip_leading_rows = skip_leading_rows

    # request to BQ
    load_job = client.load_table_from_uri(
        source_uris=source_uri,
        destination=dataset_ref.table(table),
        job_config=job_config)

    return load_job.result()

まとめ

BQ面白いので、最近頭の中はほとんどこれでいっぱいです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up