More than 3 years have passed since last update.

SageMaker PipelinesのOutputConfigの中身はどこを参照したらよいのか

Last updated at 2021-01-14Posted at 2021-01-14

はじめに

SageMaker Pipelinesにおいて、前のステップのOutputを次のStepに利用する場合に、以下のように実行する。
このstep_process.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uriの、S3Output.S3Uriのようなプロパティは、どこに記載されているのか？

pipeline_definition.py

from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.workflow.steps import ProcessingStep
from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TrainingStep

step_process = ProcessingStep(
    name="AbaloneProcess",
    processor=sklearn_processor,
    inputs=[
      ProcessingInput(source=input_data, destination="/opt/ml/processing/input"),  
    ],
    outputs=[
        ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
        ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
        ProcessingOutput(output_name="test", source="/opt/ml/processing/test")
    ],
    code="abalone/preprocessing.py",
)

xgb_train = (中略)

step_train = TrainingStep(
    name="AbaloneTrain",
    estimator=xgb_train,
    inputs={
        "train": TrainingInput(
            s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                "train"
            ].S3Output.S3Uri,   ← ココ!!!!
            content_type="text/csv"
        ),
        "validation": TrainingInput(
            s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                "validation"
            ].S3Output.S3Uri,
            content_type="text/csv"
        )
    },
)

結果

ここ
https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeProcessingJob.html

{
(前略)
   "ProcessingOutputConfig": { 
      "KmsKeyId": "string",
      "Outputs": [ 
         { 
            "AppManaged": boolean,
            "FeatureStoreOutput": { 
               "FeatureGroupName": "string"
            },
            "OutputName": "string",
            "S3Output": { 
               "LocalPath": "string",
               "S3UploadMode": "string",
               "S3Uri": "string"
            }
         }
      ]
   },
(後略)
}

こちらに記載がありました
https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#build-and-manage-properties

こっちじゃないよ

Sagemaker Python SDKのドキュメントやコード(以下)のProcessingOutputConfigには記載されていないので、混乱した。
https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.ProcessingOutput
https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/processing.py#L1079

感想

SageMaker Pipelines、CI/CD等のフルセットでのテンプレートが用意されている点は良いですね。
ただ、Step Functions Data Science SDKと比べ、ワークフローにLambda等を差し込めない等の自由度の低さは改善の余地がある気がしました。(自分が知らないだけ？)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up