0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

NotebookのServerlessバッチジョブ化

Last updated at Posted at 2019-08-04

はじめに

Jupyter Notebookで作成したPythonコードをAWSマネージドサービスを利用してバッチジョブ化できないかと思い試してみました。

用意したもの

  • AWS CLI
  • Papermill(Notebookの実行に利用)
  • Jupyter Notebook
  • Docker

環境

  • Jupyter Notebookサーバー(EC2)
  • ECS
  • ECR
  • Step Functions
  • CloudWatch

やったこと

  • Notebookファイル作成
  • Dockerイメージ作成
  • S3へのNotebookファイル保存
  • IAMロール作成
  • ECSタスク作成
  • StepFunctionsステートマシーン作成

※ECSクラスターやVPC,Subnetなど諸々は事前に作成していたものを利用しました。

処理フロー

  1. CloudWatchEventsのインプットにPapermillのパラメーターを設定
  2. CloudWatchEventsから10分にStepFunctionsステートマシーンを実行
  3. StepFunctionsステートマシーンからECSタスクを実行
  4. ECSタスクにてPapermillを実行
  5. PapermillからS3にNotebook実行結果が保存される

詳細

Notebookファイル作成

Jupyter Notebook上でパラメーター付きのNotebookファイルを作成
※nteractだとセルの右上から"Toggle Parameter Cell"の選択でパラメーター化できました。
スクリーンショット 2019-08-03 11.09.23.png

パラメーターセル
msg = "Hello, World!"
プログラム
print(msg)

Dockerイメージの作成&Push

Dockerfile
FROM python:3
  
RUN pip install papermill[all]
RUN pip install jupyter
Dokerイメージ作成&Push
$(aws ecr get-login --no-include-email --region ap-northeast-1)
docker build -t xxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/papermill ./
docker push xxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/papermill:latest

S3 Bucket作成&Notebookファイルコピー

SAM,INPUT/OUTPUT用のBucketを作成

aws s3 mb s3://${SAM_BUCKET}
aws s3 mb s3://${INPUT_NOTEBOOK}
aws s3 mb s3://${OUTPUT_NOTEBOOK}

S3へファイルをコピー

aws s3 cp HelloWorld.ipynb s3://${INPUT_NOTEBOOK}/HelloWorld.ipynb

IAMロール,ECSタスク,StepFunctionsステートマシーン,CloudWatch Events作成

CloudFormationを利用したため、テンプレートファイルを作成

template.yaml
AWSTemplateFormatVersion : '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Parameters:
      S3RolePolicyName:
            Type: String
            Default: hello-s3
      HelloWorldTaskExecutionRoleName:
            Type: String
            Default: hello-taskexec
      HelloWorldECSTaskName:
            Type: String
            Default: HelloWorldTask
      HelloWorldInvocationRoleName:
             Type: String
             Default: HelloWorld-Invocation
      ImageRepoURI:
            Type: String
            Default: xxxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/papermill:latest
      InputS3FileURI:
            Type: String
            Default: s3://xxxxxxxxx/HelloWorld.ipynb
      OutputS3FileURI:
            Type: String
            Default: s3://xxxxxxxxx/output.ipynb
      MsgParameter:
            Type: String
            Default: "Hello,Papermill"
      ClusterArn:
            Type: String
            Default: arn:aws:ecs:ap-northeast-1:xxxxxxxxx:cluster/xxxxxxxxx
      SubnetID:
            Type: AWS::EC2::Subnet::Id
            Default: subnet-xxxxxxxxx
Resources:
      #ECSタスクからのS3アクセス用ロール作成
      HelloWorldS3BucketRole:
            Type: AWS::IAM::Role
            Properties:
                  AssumeRolePolicyDocument:
                        Version: 2012-10-17
                        Statement:
                                -
                                    Effect: Allow
                                    Principal:
                                       Service:
                                         - ecs-tasks.amazonaws.com
                                    Action:
                                         - sts:AssumeRole
                  Policies:
                       -
                            PolicyName: !Ref S3RolePolicyName
                            PolicyDocument:
                                 Version: 2012-10-17
                                 Statement:
                                      -
                                         Effect: Allow
                                         Action:
                                           - s3:PutObject
                                           - s3:GetObject
                                           - s3:ListBucket
                                           - s3:DeleteObject
                                           - s3:PutObjectAcl
                                         Resource: "*"
      HelloWorldTaskExecutionRole:
            Type: AWS::IAM::Role
            Properties:
                  AssumeRolePolicyDocument:
                        Version: 2012-10-17
                        Statement:
                                -
                                  Effect: Allow
                                  Principal:
                                     Service:
                                       - ecs-tasks.amazonaws.com
                                  Action:
                                       - sts:AssumeRole
                  Policies:
                       -
                          PolicyName: !Ref HelloWorldTaskExecutionRoleName
                          PolicyDocument:
                             Version: 2012-10-17
                             Statement:
                                  -
                                    Effect: Allow
                                    Action:
                                      - ecr:GetAuthorizationToken
                                      - ecr:BatchCheckLayerAvailability
                                      - ecr:GetDownloadUrlForLayer
                                      - ecr:BatchGetImage
                                      - logs:CreateLogStream
                                      - logs:PutLogEvents
                                    Resource: "*"
      #ECSタスク作成
      HelloWorldECSTask:
            Type: AWS::ECS::TaskDefinition
            Properties:
                  ContainerDefinitions:
                        -
                          Name: !Ref HelloWorldECSTaskName
                          Image: !Ref ImageRepoURI
                          Memory: 500
                          Command:
                                  - "papermill"
                                  - !Ref InputS3FileURI
                                  - !Ref OutputS3FileURI
                  Cpu: 256
                  Memory: 512
                  NetworkMode: awsvpc
                  RequiresCompatibilities:
                        - FARGATE
                  TaskRoleArn: !GetAtt [ HelloWorldS3BucketRole, Arn ]
                  ExecutionRoleArn: !GetAtt [ HelloWorldTaskExecutionRole, Arn ]
      #StepFunctionからのECSタスク実行用ロール作成
      HelloWorldTaskExecution:
            Type: AWS::IAM::Role
            Properties:
                AssumeRolePolicyDocument:
                      Version: 2012-10-17
                      Statement:
                              -
                                Effect: Allow
                                Principal:
                                        Service:
                                                - !Sub states.amazonaws.com
                                Action: sts:AssumeRole
                Policies:
                      -
                        PolicyName: StatesExecutionPolicy
                        PolicyDocument:
                                Version: 2012-10-17
                                Statement:
                                        -
                                          Effect: Allow
                                          Action:
                                            - iam:PassRole
                                            - ecs:DescribeTasks
                                            - events:PutTargets
                                            - events:PutRule
                                            - events:DescribeRule
                                            - ecs:RunTask
                                            - ecs:StartTask
                                            - ecs:StopTask
                                          Resource: "*"
      #StepFunctionsステートマシーン作成
      HelloWorldStateMachine:
          Type: AWS::StepFunctions::StateMachine
          Properties:
               DefinitionString:
                   !Sub
                     - |-
                          {
                                  "Comment": "A Hello World example using an AWS ECS function",
                                  "TimeoutSeconds": 3600,
                                  "StartAt": "HelloWorld",
                                  "States": {
                                       "HelloWorld": {
                                             "Type": "Task",
                                             "Resource": "arn:aws:states:::ecs:runTask.sync",
                                             "Parameters": {
                                                   "LaunchType": "FARGATE",
                                                   "Cluster": "${ClusterArn}",
                                                   "TaskDefinition": "${TaskArn}",
                                                   "Overrides": {
                                                        "ContainerOverrides": [
                                                              {
                                                                  "Name": "${HelloWorldECSTaskName}",
                                                                  "Command.$": "$.Command"
                                                              }
                                                        ]
                                                   },
                                                   "NetworkConfiguration": {
                                                         "AwsvpcConfiguration": {
                                                               "Subnets": [
                                                                     "${SubnetID}"
                                                                     ],
                                                                     "AssignPublicIp": "DISABLED"
                                                          }
                                                   }
                                            },
                                             "End": true
                                        }
                                   }
                             }
                     - { TaskArn: !Ref HelloWorldECSTask }
               RoleArn: !GetAtt [ HelloWorldTaskExecution, Arn ]
      #CloudWatchからのStepFunctionsステートマシーン実行用ロール作成
      HelloWorldInvocationRole:
          Type: AWS::IAM::Role
          Properties:
              AssumeRolePolicyDocument:
                    Version: 2012-10-17
                    Statement:
                         - 
                           Effect: Allow
                           Principal:
                              Service: 
                               - events.amazonaws.com
                           Action:
                               - sts:AssumeRole
              Policies:
                  - 
                    PolicyName: !Ref HelloWorldInvocationRoleName
                    PolicyDocument:
                         Version: 2012-10-17
                         Statement:
                              - 
                                Effect: Allow
                                Action:
                                 - states:StartExecution
                                Resource: !Ref HelloWorldStateMachine
      #CloudWatch Eventsのルール作成
      HelloWorldSchedule:
          Type: AWS::Events::Rule
          Properties:
            Description: ScheduledRule
            ScheduleExpression: "rate(10 minutes)"
            State: ENABLED
            Targets:
               - 
                 Arn: !Ref HelloWorldStateMachine
                 Id: StepFunctionExecV1
                 RoleArn: !GetAtt [HelloWorldInvocationRole, Arn]
                 #InputとしてESCタスクのコマンドを文字列として設定
                 #ここでPapermillのパラメーターを設定
                 Input: !Sub "{\"Command\": [\"papermill\",\"${InputS3FileURI}\",\"${OutputS3FileURI}\",\"-p\",\"msg\",\"${MsgParameter}\"]}"

参考

Beyond Interactive: Notebook Innovation at Netflix : https://medium.com/netflix-techblog/notebook-innovation-591ee3221233

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?