More than 1 year has passed since last update.

【Amazon SageMaker】SageMaker-Python-SDKでS3の画像ファイルを読み込む

Last updated at 2024-09-27Posted at 2024-09-27

導入

Amazon SageMakerにおいて, Canvasストレージに設定したS3バケット内の画像データになるべく簡易な方法でアクセスしたい.
SageMaker-Python-SDKを使用すると, 比較的コード量を抑えられそうである.

使い方

Byteデータの画像への変換にPIL(pillow), S3へのアクセスにsagemaker(SageMaker-Python-SDK), SageMaker-Python-SDKのアウトプットをPILで受け取れる形式にするためにio(PythonのIOモジュール)を使用する.

from PIL import Image
import io
import sagemaker

SageMakerの「Canvas ストレージの設定」にて, Amazon S3 アーティファクトの場所として以下のように設定されているとする.

s3://sagemaker-ap-northeast-*-************

ここにS3バケットを作成し, ファイルを配置する.

RGB画像1枚であれば以下のように読み込める.

file = 's3://sagemaker-ap-northeast-*-************/<bucket-name>/<file-name>.png'
img = Image.open(io.BytesIO(sagemaker.s3.S3Downloader.read_bytes(file))).convert("RGB")

フォルダ内の全ファイルにアクセスする場合はlistが使える.

folder = 's3://sagemaker-ap-northeast-*-************/<bucket-name>/<folder-name>'
file_list = sagemaker.s3.S3Downloader.list(folder)

実は......

sagemaker.s3.S3Downloader.read_bytesは中でio.BytesIOのストリームに対して.read()とデータ化しているため, ストリーム ⇒ データ ⇒ ストリーム, と無駄な相互変換が行われる.

これを防ぐためにはread_bytesを少し改造した方がよい.

from sagemaker.session import Session
from sagemaker import parse_s3_url

@staticmethod
def read_s3_bytes(s3_uri):
    bucket, object_key = parse_s3_url(s3_uri)

    bytes_io = io.BytesIO()
    Session().s3_resource.Bucket(bucket).download_fileobj(object_key, bytes_io)
    bytes_io.seek(0)

    return bytes_io

file = 's3://sagemaker-ap-northeast-*-************/<bucket-name>/<file-name>.png'
img = Image.open(read_s3_bytes(file)).convert("RGB")

※処理時間的なメリットが大きいというわけではない

参考
SageMaker-Python-SDK github (https://github.com/aws/sagemaker-python-sdk)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up