自己紹介
株式会社digitalbigmoのテクニカルサポートの小池@plinecomです。
最近はDeadline Cloud と日々戦っています
Deadline CloudにおけるAutoscalingと入札戦略について、少し深堀してカスタムする
Deadline Cloud で、カスタマーマネージメントフリート(CMF)を使う場合、
Amazon EC2 Auto Scaling フリートに関する設定は、
サンプルのCloudFormationをそのままコピペで使えば最低限動くのだけれど、
https://docs.aws.amazon.com/ja_jp/deadline-cloud/latest/developerguide/create-auto-scaling.html#autoscale-ec2-fleet
もうちょっとだけ工夫する
1.ue-east1でしか動かさないので、AZを6個に増やす
日本から使う場合、実用上、コスト面から、us-east1リージョン以外選択肢に入らない。
ので、us-east1に特化させる。具体的にはより沢山の空きスポットインスタンスを手に入れるために、公式サンプルでは2アベイラビリティゾーン(AZ)しか使ってないところをus-east1リージョンのアベイラビリティゾーン(AZ)をフルに6個使うように書き換える。
と、言っても楽な方法は無くて頑張ってサブネット6個書くだけなのだけれど。
変更前
deadlinePublicSubnet0:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref deadlineVPC
CidrBlock: 100.100.16.0/22
AvailabilityZone: !Join
- ''
- - !Ref AWS::Region
- a
deadlineSubnetRouteTableAssociation0:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref deadlinePublicRouteTable
SubnetId: !Ref deadlinePublicSubnet0
deadlinePublicSubnet1:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref deadlineVPC
CidrBlock: 100.100.20.0/22
AvailabilityZone: !Join
- ''
- - !Ref AWS::Region
- b
deadlineSubnetRouteTableAssociation1:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref deadlinePublicRouteTable
SubnetId: !Ref deadlinePublicSubnet1
中略
deadlineAutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
AutoScalingGroupName: !Join
- ''
- - deadline-ASG-autoscalable-
- !Ref FleetId
MinSize: 0
MaxSize: 10
VPCZoneIdentifier:
- !Ref deadlinePublicSubnet0
- !Ref deadlinePublicSubnet1
変更後
deadlinePublicSubnet0:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref deadlineVPC
CidrBlock: 100.100.16.0/22
AvailabilityZone: !Join
- ''
- - !Ref AWS::Region
- a
deadlineSubnetRouteTableAssociation0:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref deadlinePublicRouteTable
SubnetId: !Ref deadlinePublicSubnet0
deadlinePublicSubnet1:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref deadlineVPC
CidrBlock: 100.100.20.0/22
AvailabilityZone: !Join
- ''
- - !Ref AWS::Region
- b
deadlineSubnetRouteTableAssociation1:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref deadlinePublicRouteTable
SubnetId: !Ref deadlinePublicSubnet1
deadlinePublicSubnet2:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref deadlineVPC
CidrBlock: 100.100.24.0/22
AvailabilityZone: !Join
- ''
- - !Ref AWS::Region
- c
deadlineSubnetRouteTableAssociation2:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref deadlinePublicRouteTable
SubnetId: !Ref deadlinePublicSubnet2
deadlinePublicSubnet3:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref deadlineVPC
CidrBlock: 100.100.28.0/22
AvailabilityZone: !Join
- ''
- - !Ref AWS::Region
- d
deadlineSubnetRouteTableAssociation3:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref deadlinePublicRouteTable
SubnetId: !Ref deadlinePublicSubnet3
deadlinePublicSubnet4:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref deadlineVPC
CidrBlock: 100.100.32.0/22
AvailabilityZone: !Join
- ''
- - !Ref AWS::Region
- e
deadlineSubnetRouteTableAssociation4:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref deadlinePublicRouteTable
SubnetId: !Ref deadlinePublicSubnet4
deadlinePublicSubnet5:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref deadlineVPC
CidrBlock: 100.100.36.0/22
AvailabilityZone: !Join
- ''
- - !Ref AWS::Region
- f
deadlineSubnetRouteTableAssociation5:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref deadlinePublicRouteTable
SubnetId: !Ref deadlinePublicSubnet5
中略
deadlineAutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
AutoScalingGroupName: !Join
- ''
- - deadline-ASG-autoscalable-
- !Ref FleetId
MinSize: 0
MaxSize: 10
VPCZoneIdentifier:
- !Ref deadlinePublicSubnet0
- !Ref deadlinePublicSubnet1
- !Ref deadlinePublicSubnet2
- !Ref deadlinePublicSubnet3
- !Ref deadlinePublicSubnet4
- !Ref deadlinePublicSubnet5
2.入札戦略を見直す
公式のサンプルでは安定性重視で、スポットインスタンスはcapacity-optimizeなんだけれど、もう少しコストを攻めたいので、price-capacity-optimizeに変更する。不安定になったらcapacity-optimizeに戻すか、オンデマンドインスタンスを少し混ぜるのがよかろう
また公式のサンプルは利用するスポットインスタンスをホワイトリストでやってる。
でも、積極的に安いのを攻めたいので、CPUとメモリ数ベース+ブラックリストにしたいと思った。
変更前
MixedInstancesPolicy:
InstancesDistribution:
OnDemandBaseCapacity: 0
OnDemandPercentageAboveBaseCapacity: 0
SpotAllocationStrategy: capacity-optimized
OnDemandAllocationStrategy: lowest-price
LaunchTemplate:
LaunchTemplateSpecification:
LaunchTemplateId: !Ref deadlineLaunchTemplate
Version: !GetAtt
- deadlineLaunchTemplate
- LatestVersionNumber
Overrides:
- InstanceType: m5.large
- InstanceType: m5d.large
- InstanceType: m5a.large
- InstanceType: m5ad.large
- InstanceType: m5n.large
- InstanceType: m5dn.large
- InstanceType: m4.large
- InstanceType: m3.large
- InstanceType: r5.large
- InstanceType: r5d.large
- InstanceType: r5a.large
- InstanceType: r5ad.large
- InstanceType: r5n.large
- InstanceType: r5dn.large
- InstanceType: r4.large
変更後
MixedInstancesPolicy:
InstancesDistribution:
OnDemandBaseCapacity: 0
OnDemandPercentageAboveBaseCapacity: 0
SpotAllocationStrategy: price-capacity-optimized
OnDemandAllocationStrategy: lowest-price
LaunchTemplate:
LaunchTemplateSpecification:
LaunchTemplateId: !Ref deadlineLaunchTemplate
Version: !GetAtt
- deadlineLaunchTemplate
- LatestVersionNumber
Overrides:
- InstanceRequirements:
VCpuCount:
Min: 2
MemoryMiB:
Min: 8192
ExcludedInstanceTypes:
- '*.metal'
- '*.metal*'
- i3.*
CpuManufacturers:
- intel
- amd
InstanceGenerations:
- current
SpotMaxPricePercentageOverLowestPrice: 100
CPUコア数は2コア以上
メモリは8GB以上
i3インスタンスは動作が怪しいので弾いている。また、metal系インスタンスは値段がバカ高いので入札しないように弾いておく。
CPUはARMとかをつかんじゃうとダメので、IntelとAMDに。インスタンスは現行世代を使うようにCurrent, スポットインスタンスは最低価格の2倍までは入札するように100%を設定
これだけやると、謎のinfインスタンスやzインスタンスが応札されてきて面白い。
infインスタンスをちゃんと機械学習に使うのは使いづらいんだろうなーって思う
完成したCloudFormation
AWSTemplateFormatVersion: 2010-09-09
Description: Amazon Deadline Cloud customer-managed fleet
Parameters:
FarmId:
Type: String
Description: Farm ID
FleetId:
Type: String
Description: Fleet ID
AMIId:
Type: String
Description: AMI ID for launching workers
Resources:
deadlineVPC:
Type: AWS::EC2::VPC
Properties:
CidrBlock: 100.100.0.0/16
Tags:
- Key: Name
Value: !Join
- '-'
- - deadline-vpc
- !Ref FleetId
deadlineWorkerSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: !Join
- ' '
- - Security group created for Deadline Cloud workers in the fleet
- !Ref FleetId
GroupName: !Join
- ''
- - deadlineWorkerSecurityGroup-
- !Ref FleetId
SecurityGroupEgress:
- CidrIp: 0.0.0.0/0
IpProtocol: '-1'
SecurityGroupIngress:
# TCP 5555
- IpProtocol: tcp
FromPort: 5555
ToPort: 5555
CidrIp: 0.0.0.0/0
# UDP 5555
- IpProtocol: udp
FromPort: 5555
ToPort: 5555
CidrIp: 0.0.0.0/0
# Autodesk Maya / Arnold (2701-2702 TCP IPv4/IPv6)
- IpProtocol: tcp
FromPort: 2701
ToPort: 2702
CidrIp: 0.0.0.0/0
- IpProtocol: tcp
FromPort: 2701
ToPort: 2702
CidrIpv6: '::/0'
# Cinema 4D (7057 TCP IPv4/IPv6)
- IpProtocol: tcp
FromPort: 7057
ToPort: 7057
CidrIp: 0.0.0.0/0
- IpProtocol: tcp
FromPort: 7057
ToPort: 7057
CidrIpv6: '::/0'
# KeyShot (2703 TCP IPv4/IPv6)
- IpProtocol: tcp
FromPort: 2703
ToPort: 2703
CidrIp: 0.0.0.0/0
- IpProtocol: tcp
FromPort: 2703
ToPort: 2703
CidrIpv6: '::/0'
# Foundry Nuke (6101 TCP IPv4/IPv6)
- IpProtocol: tcp
FromPort: 6101
ToPort: 6101
CidrIp: 0.0.0.0/0
- IpProtocol: tcp
FromPort: 6101
ToPort: 6101
CidrIpv6: '::/0'
# Red Giant (7055 TCP IPv4??)
- IpProtocol: tcp
FromPort: 7055
ToPort: 7055
CidrIp: 0.0.0.0/0
# Redshift (7054 TCP IPv4/IPv6)
- IpProtocol: tcp
FromPort: 7054
ToPort: 7054
CidrIp: 0.0.0.0/0
- IpProtocol: tcp
FromPort: 7054
ToPort: 7054
CidrIpv6: '::/0'
# SideFX Houdini / Mantra / Karma (1715-1717 TCP IPv4/IPv6)
- IpProtocol: tcp
FromPort: 1715
ToPort: 1717
CidrIp: 0.0.0.0/0
- IpProtocol: tcp
FromPort: 1715
ToPort: 1717
CidrIpv6: '::/0'
# VRay (30304 TCP IPv4??)
- IpProtocol: tcp
FromPort: 30304
ToPort: 30304
CidrIp: 0.0.0.0/0
VpcId: !Ref deadlineVPC
deadlineIGW:
Type: AWS::EC2::InternetGateway
Properties: {}
deadlineVPCGatewayAttachment:
Type: AWS::EC2::VPCGatewayAttachment
Properties:
VpcId: !Ref deadlineVPC
InternetGatewayId: !Ref deadlineIGW
deadlinePublicRouteTable:
Type: AWS::EC2::RouteTable
Properties:
VpcId: !Ref deadlineVPC
deadlinePublicRoute:
Type: AWS::EC2::Route
Properties:
RouteTableId: !Ref deadlinePublicRouteTable
DestinationCidrBlock: 0.0.0.0/0
GatewayId: !Ref deadlineIGW
DependsOn:
- deadlineIGW
- deadlineVPCGatewayAttachment
deadlinePublicSubnet0:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref deadlineVPC
CidrBlock: 100.100.16.0/22
AvailabilityZone: !Join
- ''
- - !Ref AWS::Region
- a
deadlineSubnetRouteTableAssociation0:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref deadlinePublicRouteTable
SubnetId: !Ref deadlinePublicSubnet0
deadlinePublicSubnet1:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref deadlineVPC
CidrBlock: 100.100.20.0/22
AvailabilityZone: !Join
- ''
- - !Ref AWS::Region
- b
deadlineSubnetRouteTableAssociation1:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref deadlinePublicRouteTable
SubnetId: !Ref deadlinePublicSubnet1
deadlinePublicSubnet2:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref deadlineVPC
CidrBlock: 100.100.24.0/22
AvailabilityZone: !Join
- ''
- - !Ref AWS::Region
- c
deadlineSubnetRouteTableAssociation2:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref deadlinePublicRouteTable
SubnetId: !Ref deadlinePublicSubnet2
deadlinePublicSubnet3:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref deadlineVPC
CidrBlock: 100.100.28.0/22
AvailabilityZone: !Join
- ''
- - !Ref AWS::Region
- d
deadlineSubnetRouteTableAssociation3:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref deadlinePublicRouteTable
SubnetId: !Ref deadlinePublicSubnet3
deadlinePublicSubnet4:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref deadlineVPC
CidrBlock: 100.100.32.0/22
AvailabilityZone: !Join
- ''
- - !Ref AWS::Region
- e
deadlineSubnetRouteTableAssociation4:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref deadlinePublicRouteTable
SubnetId: !Ref deadlinePublicSubnet4
deadlinePublicSubnet5:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref deadlineVPC
CidrBlock: 100.100.36.0/22
AvailabilityZone: !Join
- ''
- - !Ref AWS::Region
- f
deadlineSubnetRouteTableAssociation5:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref deadlinePublicRouteTable
SubnetId: !Ref deadlinePublicSubnet5
deadlineInstanceAccessAccessRole:
Type: AWS::IAM::Role
Properties:
RoleName: !Join
- '-'
- - deadline
- InstanceAccess
- !Ref FleetId
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service: ec2.amazonaws.com
Action:
- sts:AssumeRole
Path: /
ManagedPolicyArns:
- arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy
- arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
- arn:aws:iam::aws:policy/AWSDeadlineCloud-WorkerHost
deadlineInstanceProfile:
Type: AWS::IAM::InstanceProfile
Properties:
Path: /
Roles:
- !Ref deadlineInstanceAccessAccessRole
deadlineLaunchTemplate:
Type: AWS::EC2::LaunchTemplate
Properties:
LaunchTemplateName: !Join
- ''
- - deadline-LT-
- !Ref FleetId
LaunchTemplateData:
NetworkInterfaces:
- DeviceIndex: 0
AssociatePublicIpAddress: true
Groups:
- !Ref deadlineWorkerSecurityGroup
DeleteOnTermination: true
ImageId: !Ref AMIId
InstanceInitiatedShutdownBehavior: terminate
IamInstanceProfile:
Arn: !GetAtt deadlineInstanceProfile.Arn
MetadataOptions:
HttpTokens: required
HttpEndpoint: enabled
deadlineAutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
AutoScalingGroupName: !Join
- ''
- - deadline-ASG-autoscalable-
- !Ref FleetId
MinSize: 0
MaxSize: 10
VPCZoneIdentifier:
- !Ref deadlinePublicSubnet0
- !Ref deadlinePublicSubnet1
- !Ref deadlinePublicSubnet2
- !Ref deadlinePublicSubnet3
- !Ref deadlinePublicSubnet4
- !Ref deadlinePublicSubnet5
NewInstancesProtectedFromScaleIn: true
MixedInstancesPolicy:
InstancesDistribution:
OnDemandBaseCapacity: 0
OnDemandPercentageAboveBaseCapacity: 0
SpotAllocationStrategy: price-capacity-optimized
OnDemandAllocationStrategy: lowest-price
LaunchTemplate:
LaunchTemplateSpecification:
LaunchTemplateId: !Ref deadlineLaunchTemplate
Version: !GetAtt deadlineLaunchTemplate.LatestVersionNumber
Overrides:
- InstanceRequirements:
VCpuCount:
Min: 2
MemoryMiB:
Min: 8192
ExcludedInstanceTypes:
- '*.metal'
- '*.metal*'
- i3.*
CpuManufacturers:
- intel
- amd
InstanceGenerations:
- current
SpotMaxPricePercentageOverLowestPrice: 100
MetricsCollection:
- Granularity: 1Minute
Metrics:
- GroupMinSize
- GroupMaxSize
- GroupDesiredCapacity
- GroupInServiceInstances
- GroupTotalInstances
- GroupInServiceCapacity
- GroupTotalCapacity
宣伝
株式会社digitalbigmoでは、ソフトの開発から、レンダーファームの構築まで手広くお仕事をしています。お困りの際はお気軽にお問い合わせください。相談は無料で承っております。