0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

Deadline CloudにおけるAutoscalingと入札戦略について

0
Posted at

自己紹介

株式会社digitalbigmoのテクニカルサポートの小池@plinecomです。
最近はDeadline Cloud と日々戦っています

Deadline CloudにおけるAutoscalingと入札戦略について、少し深堀してカスタムする

Deadline Cloud で、カスタマーマネージメントフリート(CMF)を使う場合、
Amazon EC2 Auto Scaling フリートに関する設定は、
サンプルのCloudFormationをそのままコピペで使えば最低限動くのだけれど、
https://docs.aws.amazon.com/ja_jp/deadline-cloud/latest/developerguide/create-auto-scaling.html#autoscale-ec2-fleet
もうちょっとだけ工夫する

1.ue-east1でしか動かさないので、AZを6個に増やす

日本から使う場合、実用上、コスト面から、us-east1リージョン以外選択肢に入らない。
ので、us-east1に特化させる。具体的にはより沢山の空きスポットインスタンスを手に入れるために、公式サンプルでは2アベイラビリティゾーン(AZ)しか使ってないところをus-east1リージョンのアベイラビリティゾーン(AZ)をフルに6個使うように書き換える。

と、言っても楽な方法は無くて頑張ってサブネット6個書くだけなのだけれど。

変更前

  deadlinePublicSubnet0:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref deadlineVPC
      CidrBlock: 100.100.16.0/22
      AvailabilityZone: !Join
        - ''
        - - !Ref AWS::Region
          - a
  deadlineSubnetRouteTableAssociation0:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref deadlinePublicRouteTable
      SubnetId: !Ref deadlinePublicSubnet0
  deadlinePublicSubnet1:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref deadlineVPC
      CidrBlock: 100.100.20.0/22
      AvailabilityZone: !Join
        - ''
        - - !Ref AWS::Region
          - b
  deadlineSubnetRouteTableAssociation1:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref deadlinePublicRouteTable
      SubnetId: !Ref deadlinePublicSubnet1

中略


  deadlineAutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      AutoScalingGroupName: !Join
        - ''
        - - deadline-ASG-autoscalable-
          - !Ref FleetId
      MinSize: 0
      MaxSize: 10
      VPCZoneIdentifier:
        - !Ref deadlinePublicSubnet0
        - !Ref deadlinePublicSubnet1

変更後

  deadlinePublicSubnet0:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref deadlineVPC
      CidrBlock: 100.100.16.0/22
      AvailabilityZone: !Join
        - ''
        - - !Ref AWS::Region
          - a
  deadlineSubnetRouteTableAssociation0:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref deadlinePublicRouteTable
      SubnetId: !Ref deadlinePublicSubnet0
  deadlinePublicSubnet1:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref deadlineVPC
      CidrBlock: 100.100.20.0/22
      AvailabilityZone: !Join
        - ''
        - - !Ref AWS::Region
          - b
  deadlineSubnetRouteTableAssociation1:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref deadlinePublicRouteTable
      SubnetId: !Ref deadlinePublicSubnet1
  deadlinePublicSubnet2:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref deadlineVPC
      CidrBlock: 100.100.24.0/22
      AvailabilityZone: !Join
        - ''
        - - !Ref AWS::Region
          - c
  deadlineSubnetRouteTableAssociation2:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref deadlinePublicRouteTable
      SubnetId: !Ref deadlinePublicSubnet2
  deadlinePublicSubnet3:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref deadlineVPC
      CidrBlock: 100.100.28.0/22
      AvailabilityZone: !Join
        - ''
        - - !Ref AWS::Region
          - d
  deadlineSubnetRouteTableAssociation3:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref deadlinePublicRouteTable
      SubnetId: !Ref deadlinePublicSubnet3
  deadlinePublicSubnet4:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref deadlineVPC
      CidrBlock: 100.100.32.0/22
      AvailabilityZone: !Join
        - ''
        - - !Ref AWS::Region
          - e
  deadlineSubnetRouteTableAssociation4:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref deadlinePublicRouteTable
      SubnetId: !Ref deadlinePublicSubnet4
  deadlinePublicSubnet5:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref deadlineVPC
      CidrBlock: 100.100.36.0/22
      AvailabilityZone: !Join
        - ''
        - - !Ref AWS::Region
          - f
  deadlineSubnetRouteTableAssociation5:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref deadlinePublicRouteTable
      SubnetId: !Ref deadlinePublicSubnet5

中略


  deadlineAutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      AutoScalingGroupName: !Join
        - ''
        - - deadline-ASG-autoscalable-
          - !Ref FleetId
      MinSize: 0
      MaxSize: 10
      VPCZoneIdentifier:
        - !Ref deadlinePublicSubnet0
        - !Ref deadlinePublicSubnet1
        - !Ref deadlinePublicSubnet2
        - !Ref deadlinePublicSubnet3
        - !Ref deadlinePublicSubnet4
        - !Ref deadlinePublicSubnet5

2.入札戦略を見直す

公式のサンプルでは安定性重視で、スポットインスタンスはcapacity-optimizeなんだけれど、もう少しコストを攻めたいので、price-capacity-optimizeに変更する。不安定になったらcapacity-optimizeに戻すか、オンデマンドインスタンスを少し混ぜるのがよかろう

また公式のサンプルは利用するスポットインスタンスをホワイトリストでやってる。
でも、積極的に安いのを攻めたいので、CPUとメモリ数ベース+ブラックリストにしたいと思った。

変更前

      MixedInstancesPolicy:
        InstancesDistribution:
          OnDemandBaseCapacity: 0
          OnDemandPercentageAboveBaseCapacity: 0
          SpotAllocationStrategy: capacity-optimized
          OnDemandAllocationStrategy: lowest-price
        LaunchTemplate:
          LaunchTemplateSpecification:
            LaunchTemplateId: !Ref deadlineLaunchTemplate
            Version: !GetAtt
              - deadlineLaunchTemplate
              - LatestVersionNumber
          Overrides:
            - InstanceType: m5.large
            - InstanceType: m5d.large
            - InstanceType: m5a.large
            - InstanceType: m5ad.large
            - InstanceType: m5n.large
            - InstanceType: m5dn.large
            - InstanceType: m4.large
            - InstanceType: m3.large
            - InstanceType: r5.large
            - InstanceType: r5d.large
            - InstanceType: r5a.large
            - InstanceType: r5ad.large
            - InstanceType: r5n.large
            - InstanceType: r5dn.large
            - InstanceType: r4.large

変更後

      MixedInstancesPolicy:
        InstancesDistribution:
          OnDemandBaseCapacity: 0
          OnDemandPercentageAboveBaseCapacity: 0
          SpotAllocationStrategy: price-capacity-optimized
          OnDemandAllocationStrategy: lowest-price
        LaunchTemplate:
          LaunchTemplateSpecification:
            LaunchTemplateId: !Ref deadlineLaunchTemplate
            Version: !GetAtt 
              - deadlineLaunchTemplate
              - LatestVersionNumber
          Overrides:
            - InstanceRequirements:
                VCpuCount:
                  Min: 2
                MemoryMiB:
                  Min: 8192
                ExcludedInstanceTypes:
                  - '*.metal'
                  - '*.metal*'
                  - i3.*
                CpuManufacturers:
                  - intel
                  - amd
                InstanceGenerations:
                  - current
                SpotMaxPricePercentageOverLowestPrice: 100

CPUコア数は2コア以上
メモリは8GB以上

i3インスタンスは動作が怪しいので弾いている。また、metal系インスタンスは値段がバカ高いので入札しないように弾いておく。

CPUはARMとかをつかんじゃうとダメので、IntelとAMDに。インスタンスは現行世代を使うようにCurrent, スポットインスタンスは最低価格の2倍までは入札するように100%を設定

これだけやると、謎のinfインスタンスやzインスタンスが応札されてきて面白い。
infインスタンスをちゃんと機械学習に使うのは使いづらいんだろうなーって思う

完成したCloudFormation

CloudFormation
AWSTemplateFormatVersion: 2010-09-09
Description: Amazon Deadline Cloud customer-managed fleet
Parameters:
  FarmId:
    Type: String
    Description: Farm ID
  FleetId:
    Type: String
    Description: Fleet ID
  AMIId:
    Type: String
    Description: AMI ID for launching workers
Resources:
  deadlineVPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 100.100.0.0/16
      Tags:
        - Key: Name
          Value: !Join
            - '-'
            - - deadline-vpc
              - !Ref FleetId
  deadlineWorkerSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: !Join
        - ' '
        - - Security group created for Deadline Cloud workers in the fleet
          - !Ref FleetId
      GroupName: !Join
        - ''
        - - deadlineWorkerSecurityGroup-
          - !Ref FleetId
      SecurityGroupEgress:
        - CidrIp: 0.0.0.0/0
          IpProtocol: '-1'
      SecurityGroupIngress:
        # TCP 5555
        - IpProtocol: tcp
          FromPort: 5555
          ToPort: 5555
          CidrIp: 0.0.0.0/0

        # UDP 5555
        - IpProtocol: udp
          FromPort: 5555
          ToPort: 5555
          CidrIp: 0.0.0.0/0

        # Autodesk Maya / Arnold (2701-2702 TCP IPv4/IPv6)
        - IpProtocol: tcp
          FromPort: 2701
          ToPort: 2702
          CidrIp: 0.0.0.0/0
        - IpProtocol: tcp
          FromPort: 2701
          ToPort: 2702
          CidrIpv6: '::/0'

        # Cinema 4D (7057 TCP IPv4/IPv6)
        - IpProtocol: tcp
          FromPort: 7057
          ToPort: 7057
          CidrIp: 0.0.0.0/0
        - IpProtocol: tcp
          FromPort: 7057
          ToPort: 7057
          CidrIpv6: '::/0'

        # KeyShot (2703 TCP IPv4/IPv6)
        - IpProtocol: tcp
          FromPort: 2703
          ToPort: 2703
          CidrIp: 0.0.0.0/0
        - IpProtocol: tcp
          FromPort: 2703
          ToPort: 2703
          CidrIpv6: '::/0'

        # Foundry Nuke (6101 TCP IPv4/IPv6)
        - IpProtocol: tcp
          FromPort: 6101
          ToPort: 6101
          CidrIp: 0.0.0.0/0
        - IpProtocol: tcp
          FromPort: 6101
          ToPort: 6101
          CidrIpv6: '::/0'

        # Red Giant (7055 TCP IPv4??)
        - IpProtocol: tcp
          FromPort: 7055
          ToPort: 7055
          CidrIp: 0.0.0.0/0

        # Redshift (7054 TCP IPv4/IPv6)
        - IpProtocol: tcp
          FromPort: 7054
          ToPort: 7054
          CidrIp: 0.0.0.0/0
        - IpProtocol: tcp
          FromPort: 7054
          ToPort: 7054
          CidrIpv6: '::/0'

        # SideFX Houdini / Mantra / Karma (1715-1717 TCP IPv4/IPv6)
        - IpProtocol: tcp
          FromPort: 1715
          ToPort: 1717
          CidrIp: 0.0.0.0/0
        - IpProtocol: tcp
          FromPort: 1715
          ToPort: 1717
          CidrIpv6: '::/0'

        # VRay (30304 TCP IPv4??)
        - IpProtocol: tcp
          FromPort: 30304
          ToPort: 30304
          CidrIp: 0.0.0.0/0

      VpcId: !Ref deadlineVPC
  deadlineIGW:
    Type: AWS::EC2::InternetGateway
    Properties: {}
  deadlineVPCGatewayAttachment:
    Type: AWS::EC2::VPCGatewayAttachment
    Properties:
      VpcId: !Ref deadlineVPC
      InternetGatewayId: !Ref deadlineIGW
  deadlinePublicRouteTable:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref deadlineVPC
  deadlinePublicRoute:
    Type: AWS::EC2::Route
    Properties:
      RouteTableId: !Ref deadlinePublicRouteTable
      DestinationCidrBlock: 0.0.0.0/0
      GatewayId: !Ref deadlineIGW
    DependsOn:
      - deadlineIGW
      - deadlineVPCGatewayAttachment
  deadlinePublicSubnet0:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref deadlineVPC
      CidrBlock: 100.100.16.0/22
      AvailabilityZone: !Join
        - ''
        - - !Ref AWS::Region
          - a
  deadlineSubnetRouteTableAssociation0:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref deadlinePublicRouteTable
      SubnetId: !Ref deadlinePublicSubnet0
  deadlinePublicSubnet1:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref deadlineVPC
      CidrBlock: 100.100.20.0/22
      AvailabilityZone: !Join
        - ''
        - - !Ref AWS::Region
          - b
  deadlineSubnetRouteTableAssociation1:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref deadlinePublicRouteTable
      SubnetId: !Ref deadlinePublicSubnet1
  deadlinePublicSubnet2:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref deadlineVPC
      CidrBlock: 100.100.24.0/22
      AvailabilityZone: !Join
        - ''
        - - !Ref AWS::Region
          - c
  deadlineSubnetRouteTableAssociation2:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref deadlinePublicRouteTable
      SubnetId: !Ref deadlinePublicSubnet2
  deadlinePublicSubnet3:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref deadlineVPC
      CidrBlock: 100.100.28.0/22
      AvailabilityZone: !Join
        - ''
        - - !Ref AWS::Region
          - d
  deadlineSubnetRouteTableAssociation3:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref deadlinePublicRouteTable
      SubnetId: !Ref deadlinePublicSubnet3
  deadlinePublicSubnet4:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref deadlineVPC
      CidrBlock: 100.100.32.0/22
      AvailabilityZone: !Join
        - ''
        - - !Ref AWS::Region
          - e
  deadlineSubnetRouteTableAssociation4:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref deadlinePublicRouteTable
      SubnetId: !Ref deadlinePublicSubnet4
  deadlinePublicSubnet5:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref deadlineVPC
      CidrBlock: 100.100.36.0/22
      AvailabilityZone: !Join
        - ''
        - - !Ref AWS::Region
          - f
  deadlineSubnetRouteTableAssociation5:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref deadlinePublicRouteTable
      SubnetId: !Ref deadlinePublicSubnet5
  deadlineInstanceAccessAccessRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: !Join
        - '-'
        - - deadline
          - InstanceAccess
          - !Ref FleetId
      AssumeRolePolicyDocument:
        Statement:
          - Effect: Allow
            Principal:
              Service: ec2.amazonaws.com
            Action:
              - sts:AssumeRole
      Path: /
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy
        - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
        - arn:aws:iam::aws:policy/AWSDeadlineCloud-WorkerHost
  deadlineInstanceProfile:
    Type: AWS::IAM::InstanceProfile
    Properties:
      Path: /
      Roles:
        - !Ref deadlineInstanceAccessAccessRole
  deadlineLaunchTemplate:
    Type: AWS::EC2::LaunchTemplate
    Properties:
      LaunchTemplateName: !Join
        - ''
        - - deadline-LT-
          - !Ref FleetId
      LaunchTemplateData:
        NetworkInterfaces:
          - DeviceIndex: 0
            AssociatePublicIpAddress: true
            Groups:
              - !Ref deadlineWorkerSecurityGroup
            DeleteOnTermination: true
        ImageId: !Ref AMIId
        InstanceInitiatedShutdownBehavior: terminate
        IamInstanceProfile:
          Arn: !GetAtt deadlineInstanceProfile.Arn
        MetadataOptions:
          HttpTokens: required
          HttpEndpoint: enabled

  deadlineAutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      AutoScalingGroupName: !Join
        - ''
        - - deadline-ASG-autoscalable-
          - !Ref FleetId
      MinSize: 0
      MaxSize: 10
      VPCZoneIdentifier:
        - !Ref deadlinePublicSubnet0
        - !Ref deadlinePublicSubnet1
        - !Ref deadlinePublicSubnet2
        - !Ref deadlinePublicSubnet3
        - !Ref deadlinePublicSubnet4
        - !Ref deadlinePublicSubnet5
      NewInstancesProtectedFromScaleIn: true
      MixedInstancesPolicy:
        InstancesDistribution:
          OnDemandBaseCapacity: 0
          OnDemandPercentageAboveBaseCapacity: 0
          SpotAllocationStrategy: price-capacity-optimized
          OnDemandAllocationStrategy: lowest-price
        LaunchTemplate:
          LaunchTemplateSpecification:
            LaunchTemplateId: !Ref deadlineLaunchTemplate
            Version: !GetAtt deadlineLaunchTemplate.LatestVersionNumber
          Overrides:
            - InstanceRequirements:
                VCpuCount:
                  Min: 2
                MemoryMiB:
                  Min: 8192
                ExcludedInstanceTypes:
                  - '*.metal'
                  - '*.metal*'
                  - i3.*
                CpuManufacturers:
                  - intel
                  - amd
                InstanceGenerations:
                  - current
                SpotMaxPricePercentageOverLowestPrice: 100
      MetricsCollection:
        - Granularity: 1Minute
          Metrics:
            - GroupMinSize
            - GroupMaxSize
            - GroupDesiredCapacity
            - GroupInServiceInstances
            - GroupTotalInstances
            - GroupInServiceCapacity
            - GroupTotalCapacity

宣伝

株式会社digitalbigmoでは、ソフトの開発から、レンダーファームの構築まで手広くお仕事をしています。お困りの際はお気軽にお問い合わせください。相談は無料で承っております。

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?