0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

SageMaker PipelinesのOutputConfigの中身はどこを参照したらよいのか

Last updated at Posted at 2021-01-14

はじめに

SageMaker Pipelinesにおいて、前のステップのOutputを次のStepに利用する場合に、以下のように実行する。
このstep_process.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uriの、S3Output.S3Uriのようなプロパティは、どこに記載されているのか?

pipeline_definition.py
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.workflow.steps import ProcessingStep
from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TrainingStep

step_process = ProcessingStep(
    name="AbaloneProcess",
    processor=sklearn_processor,
    inputs=[
      ProcessingInput(source=input_data, destination="/opt/ml/processing/input"),  
    ],
    outputs=[
        ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
        ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
        ProcessingOutput(output_name="test", source="/opt/ml/processing/test")
    ],
    code="abalone/preprocessing.py",
)

xgb_train = (中略)

step_train = TrainingStep(
    name="AbaloneTrain",
    estimator=xgb_train,
    inputs={
        "train": TrainingInput(
            s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                "train"
            ].S3Output.S3Uri,    ココ!!!!
            content_type="text/csv"
        ),
        "validation": TrainingInput(
            s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                "validation"
            ].S3Output.S3Uri,
            content_type="text/csv"
        )
    },
)

結果

ここ
https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeProcessingJob.html

{
(前略)
   "ProcessingOutputConfig": { 
      "KmsKeyId": "string",
      "Outputs": [ 
         { 
            "AppManaged": boolean,
            "FeatureStoreOutput": { 
               "FeatureGroupName": "string"
            },
            "OutputName": "string",
            "S3Output": { 
               "LocalPath": "string",
               "S3UploadMode": "string",
               "S3Uri": "string"
            }
         }
      ]
   },
(後略)
}

こちらに記載がありました
https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#build-and-manage-properties

こっちじゃないよ

Sagemaker Python SDKのドキュメントやコード(以下)のProcessingOutputConfigには記載されていないので、混乱した。
https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.ProcessingOutput
https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/processing.py#L1079

感想

SageMaker Pipelines、CI/CD等のフルセットでのテンプレートが用意されている点は良いですね。
ただ、Step Functions Data Science SDKと比べ、ワークフローにLambda等を差し込めない等の自由度の低さは改善の余地がある気がしました。(自分が知らないだけ?)

0
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?