More than 1 year has passed since last update.

PythonでCloudStorageにあるCSVファイルをDataFrameで読み込む

Last updated at 2022-11-21Posted at 2022-11-20

何をするのか

GCPのCloudStorageにあるcsvファイルをPythonプログラムから読み込み，pandasのDataFrameに格納します．

前提条件

CloudStorageにcsvデータが格納されている
認証（サービスアカウント）の設定が完了している
必要なPythonライブラリがインストールされている

CloudStorageに格納しているcsvファイル

今回は，愛知県・岐阜県・三重県の人口データが入っているcsvファイルを，サンプルとしてCloud Storageに格納しています．

id,prefecture,population
1,Aichi,7497521
2,Gifu,1945350
3,Mie,1759711

必要なライブラリをインストール

requirements.txt を用意して，以下のライブラリをインストールします．

requirements.txt

pandas
google-cloud-storage

pip install -r requirements.txt

プログラム

以下，CloudStorageにあるcsvファイルをDataFrameに格納するプログラムです．

今回，ファイル名は sample.csv でCloudStorageに格納しています．

import pandas as pd
from google.cloud import storage
from io import BytesIO

def fetch_csv():
    client = storage.Client()

    # Cloud Storageのバケットを定義
    bucket = client.get_bucket('{YOUR_BUCKET_NAME}')

    # csvファイルのパスを定義する
    blob = bucket.blob('sample.csv')

    # ファイルのデータを取得
    content = blob.download_as_bytes() 

    # バイトデータからpandasに変換
    stop_times_df = pd.read_csv(BytesIO(content))

    print(stop_times_df)

if __name__ == '__main__':
    fetch_csv()

出力結果は下記のとおりです．

   id prefecture  population
0   1      Aichi     7497521
1   2       Gifu     1945350
2   3        Mie     1759711

csvファイルの id をindexに指定したい場合は， read_csv でindexを設定してください．

# バイトデータからpandasに変換
stop_times_pd = pd.read_csv(BytesIO(content), index_col=0)

   prefecture  population
id                       
1       Aichi     7497521
2        Gifu     1945350
3         Mie     1759711

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up