はじめに
2007年
創業したオークファンでは、古いシステムはもちろん存在します
それらのシステムは基本複雑で、運用コストがたくさんかかります
運用コストを削減するために、人間はシステム自動化し、旧システムとの終わらない戦いを挑みます
自動化するための戦い
一回戦: 32ビットOSで動く運用でカバーシステム
- 運用が大変!!
- OSの制限で、スケールアップできない
- ハードの老朽化により障害多発
- 週末も油断できない!!!
- いつか壊れるか冷や冷やする
- ストレスは美容の大敵
- いつか壊れるか冷や冷やする
- データセンター寒い!!!!
二回戦: 64ビットOSに移行し、クラウド化
-
32 bit 環境前提で作成されている C プログラムを 64 bit 化する
- スペックアップでパフォーマンスの上昇↑↑
- クラウド化することで、冗長性を強化し、復旧時に必要な時間も短縮できる
- AutoScailingで、スケールアウトや自動復旧できるので、週末も安心
- CI/CDができていないので、デプロイは手動で対応しかない
三回戦: CI/CDより自動化とサーバレス化
- サーバレスなのでサーバの運用がなくなる
- デプロイの自動化で、障害や運用ミスを減らす
- 他の仕事が捗る
- アラート完備なので、基本やることはない!!
- 楽でいいね!!!
このように、旧システムは消滅され、サービスの安定化を確保されました
では、CI/CDの自動化とサーバレス化
の一部の内容を紹介します
CI/CDの一環として、CodePipelineでECRのイメージをCodeDeployのBlue/Greenデプロイで、Fargateにデプロイします
ECSやCodePipelineの設定をCloudFormationで構築します
構築手順
ECRのリポジトリ
- ECRで名前が
test/nginx
のリポジトリを作成します- 方法は割愛します
-
nginx
のイメージをプッシュします
# docker pull nginx
Using default tag: latest
latest: Pulling from library/nginx
852e50cd189d: Pull complete
571d7e852307: Pull complete
addb10abd9cb: Pull complete
d20aa7ccdb77: Pull complete
8b03f1e11359: Pull complete
Digest: sha256:6b1daa9462046581ac15be20277a7c75476283f969cb3a61c8725ec38d3b01c3
Status: Downloaded newer image for nginx:latest
docker.io/library/nginx:latest
# aws ecr get-login-password --region ap-northeast-1 | \
docker login --username AWS --password-stdin 1234567890.dkr.ecr.ap-northeast-1.amazonaws.com
Login Succeeded
# docker tag nginx:latest 1234567890.dkr.ecr.ap-northeast-1.amazonaws.com/test/nginx:latest
# docker push 1234567890.dkr.ecr.ap-northeast-1.amazonaws.com/test/nginx:latest
The push refers to repository [1234567890.dkr.ecr.ap-northeast-1.amazonaws.com/test/nginx]
7e914612e366: Pushed
f790aed835ee: Pushed
850c2400ea4d: Pushed
7ccabd267c9f: Pushed
f5600c6330da: Pushed
latest: digest: sha256:99d0a53e3718cef59443558607d1e100b325d6a2b678cd2a48b05e5e22ffeb49 size: 1362
CloudFormationでECSの環境作成
- 以下のテンプレートでスタックを作成します
- スタック作成の方法は割愛します
`cfn.yaml`
AWSTemplateFormatVersion: 2010-09-09
Description: "Test nginx template."
Metadata:
AWS::CloudFormation::Interface:
ParameterGroups:
- Label:
default: Environment Configuration
Parameters:
- Service
- VpcId
- Subnets
- Label:
default: ApplicationLoadBalancer Configuration
Parameters:
- ALBScheme
- EnableALBAccessLog
- CertificateArn
- Label:
default: TargetGroup Configuration
Parameters:
- TargetType
- HealthCheckPath
- HealthCheckPort
- TgPort
- TgProtocol
- DeregistrationTimeout
- Label:
default: ECS Configuration
Parameters:
- ImageVersion
- containerPort
- DesiredCount
- HealthCheckGracePeriodSeconds
- ExecutionRoleArn
Parameters:
Service:
Type: String
Description: Service name
Default: "test-nginx"
VpcId:
Type: AWS::EC2::VPC::Id
Description: VPC ID
Subnets:
Type: List<AWS::EC2::Subnet::Id>
Description: The list of the Subnet ID
TargetType:
Type: String
Description: Target instance
AllowedValues:
- ip
- instance
Default: ip
HealthCheckPath:
Type: String
Description: Health Check Path
AllowedPattern: ^/.*$
Default: /
HealthCheckPort:
Type: Number
Description: Health Check Port
MinValue: '0'
MaxValue: '65535'
Default: '80'
TgPort:
Type: Number
Description: Target group Port
MinValue: '0'
MaxValue: '65535'
Default: '80'
TgProtocol:
Type: String
Description: 'Target group protocol'
AllowedValues:
- HTTP
- HTTPS
- TCP
Default: HTTP
DeregistrationTimeout:
Type: Number
Description: "Deregistration delay timeout seconds"
MinValue: '0'
Default: '300'
ALBScheme:
Type: String
Description: Enter whether ALB is for internal or internet use
AllowedValues:
- internet-facing
- internal
EnableALBAccessLog:
Type: String
AllowedValues:
- true
- false
ImageVersion:
Type: String
Description: The image version to start a container
Default: latest
containerPort:
Type: Number
Description: Container Port
MinValue: '0'
MaxValue: '65535'
Default: '80'
DesiredCount:
Type: Number
Description: The number of instantiations of the specified task definition to place and keep running on cluster
MinValue: '0'
MaxValue: '20'
Default: '2'
HealthCheckGracePeriodSeconds:
Type: Number
Description: health checks after a task has first started
MinValue: '0'
MaxValue: '2147483647'
Default: '300'
CertificateArn:
Type: String
Description: Certificate Arn
ExecutionRoleArn:
Type: String
Description: The ExecutionRole Arn for ECS Task Definition
Resources:
ALBSecurityGroup:
Type: "AWS::EC2::SecurityGroup"
Properties:
GroupName: !Sub ${Service}-alb
GroupDescription: !Sub "ALB security group for ${Service}"
VpcId: !Ref VpcId
ALBSecurityGroupHTTPIngress:
Type: "AWS::EC2::SecurityGroupIngress"
Properties:
GroupId: !Ref ALBSecurityGroup
IpProtocol: "tcp"
FromPort: 80
ToPort: 80
CidrIp: "0.0.0.0/0"
ALBSecurityGroupHTTPSIngress:
Type: "AWS::EC2::SecurityGroupIngress"
Properties:
GroupId: !Ref ALBSecurityGroup
IpProtocol: "tcp"
FromPort: 443
ToPort: 443
CidrIp: "0.0.0.0/0"
ECSSecurityGroup:
Type: "AWS::EC2::SecurityGroup"
Properties:
GroupName: !Sub ${Service}-ec2
GroupDescription: !Sub "EC2 security group for ${Service}"
SecurityGroupIngress:
-
IpProtocol: "tcp"
FromPort: 80
ToPort: 80
SourceSecurityGroupId: !Ref ALBSecurityGroup
VpcId: !Ref VpcId
ALBTargetGroup1:
Type: 'AWS::ElasticLoadBalancingV2::TargetGroup'
Properties:
Name: !Sub ${Service}-1
Port: !Ref TgPort
Protocol: !Ref TgProtocol
TargetType: !Ref TargetType
VpcId: !Ref VpcId
HealthCheckPath: !Ref HealthCheckPath
HealthCheckPort: !Ref HealthCheckPort
Matcher:
HttpCode: "200"
TargetGroupAttributes:
- Key: deregistration_delay.timeout_seconds
Value: !Ref DeregistrationTimeout
ALBTargetGroup2:
Type: 'AWS::ElasticLoadBalancingV2::TargetGroup'
Properties:
Name: !Sub ${Service}-2
Port: !Ref TgPort
Protocol: !Ref TgProtocol
TargetType: !Ref TargetType
VpcId: !Ref VpcId
HealthCheckPath: !Ref HealthCheckPath
HealthCheckPort: !Ref HealthCheckPort
Matcher:
HttpCode: "200"
TargetGroupAttributes:
- Key: deregistration_delay.timeout_seconds
Value: !Ref DeregistrationTimeout
ApplicationLoadBalancer:
Type: "AWS::ElasticLoadBalancingV2::LoadBalancer"
Properties:
Scheme: !Ref ALBScheme
LoadBalancerAttributes:
- Key: access_logs.s3.enabled
Value: !Ref EnableALBAccessLog
- Key: access_logs.s3.bucket
Value: aucfan-elb-access-log
- Key: access_logs.s3.prefix
Value: !Sub "${Service}"
Name: !Sub "${Service}"
SecurityGroups:
- !Ref ALBSecurityGroup
Subnets: !Ref Subnets
Type: application
HTTPListener:
Type: "AWS::ElasticLoadBalancingV2::Listener"
Properties:
DefaultActions:
- Type: redirect
RedirectConfig:
Host: '#{host}'
Path: '/#{path}'
Port: "443"
Protocol: HTTPS
Query: '#{query}'
StatusCode: HTTP_301
LoadBalancerArn: !Ref ApplicationLoadBalancer
Port: 80
Protocol: HTTP
HTTPSListener:
Type: "AWS::ElasticLoadBalancingV2::Listener"
Properties:
Certificates:
- CertificateArn: !Ref CertificateArn
DefaultActions:
- Type: forward
TargetGroupArn: !Ref ALBTargetGroup1
LoadBalancerArn: !Ref ApplicationLoadBalancer
Port: 443
Protocol: HTTPS
EcsTaskDefinition:
Type: AWS::ECS::TaskDefinition
Properties:
ContainerDefinitions:
- Name: !Sub ${Service}-container
Image: !Sub "${AWS::AccountId}.dkr.ecr.${AWS::Region}.amazonaws.com/test/nginx:${ImageVersion}"
PortMappings:
- ContainerPort: !Ref containerPort
HostPort: !Ref containerPort
Protocol: tcp
Essential: true
Cpu: "256"
ExecutionRoleArn: !Ref ExecutionRoleArn
Family: !Sub ${Service}
Memory: "512"
NetworkMode: awsvpc
RequiresCompatibilities:
- FARGATE
EcsCluster:
Type: AWS::ECS::Cluster
Properties:
ClusterName: !Sub ${Service}
ClusterSettings:
- Name: containerInsights
Value: enabled
EcsService:
Type: AWS::ECS::Service
Properties:
CapacityProviderStrategy:
- CapacityProvider: FARGATE
Base: !Ref DesiredCount
Weight: 1
- CapacityProvider: FARGATE_SPOT
Weight: 4
Cluster: !Ref EcsCluster
DeploymentController:
Type: CODE_DEPLOY
DesiredCount: !Ref DesiredCount
EnableECSManagedTags: True
HealthCheckGracePeriodSeconds: !Ref HealthCheckGracePeriodSeconds
LoadBalancers:
- ContainerName: !Sub ${Service}-container
ContainerPort: !Ref containerPort
TargetGroupArn: !Ref ALBTargetGroup1
NetworkConfiguration:
AwsvpcConfiguration:
AssignPublicIp: ENABLED
SecurityGroups:
- !Ref ECSSecurityGroup
Subnets: !Ref Subnets
PropagateTags: SERVICE
SchedulingStrategy: REPLICA
ServiceName: !Sub ${Service}
TaskDefinition: !Ref EcsTaskDefinition
DependsOn:
- ApplicationLoadBalancer
- HTTPSListenerRule1
S3Bucket:
Type: AWS::S3::Bucket
Properties:
BucketName: !Sub "${Service}-bucket"
VersioningConfiguration:
Status: Enabled
LifecycleConfiguration:
Rules:
- Id: "DeleteOldVersionAfter5Days"
AbortIncompleteMultipartUpload:
DaysAfterInitiation: 7
NoncurrentVersionExpirationInDays: 5
Status: Enabled
DependsOn:
- ApplicationLoadBalancer
- HTTPSListenerRule1
ここまでの確認
ソースファイルの作成とS3に配置
ファイル作成
- 以下2つのファイルを作成します
- ロール名などを適宜に変更してください
{
"executionRoleArn": "arn:aws:iam::1234567890:role/ecsTaskExecutionRole",
"containerDefinitions": [
{
"name": "test-nginx",
"image": "<ECR_IMAGE>",
"essential": true,
"portMappings": [
{
"hostPort": 80,
"protocol": "tcp",
"containerPort": 80
}
]
}
],
"requiresCompatibilities": [
"FARGATE"
],
"networkMode": "awsvpc",
"cpu": "256",
"memory": "512",
"family": "test-nginx"
}
version: 0.0
Resources:
- TargetService:
Type: AWS::ECS::Service
Properties:
TaskDefinition: <TASK_DEFINITION>
LoadBalancerInfo:
ContainerName: "test-nginx"
ContainerPort: 80
S3に配置
- CloudFomrtionで作成したバケットにアップロードします
# zip -j artifacts.zip taskdef.json appspec.yaml
adding: taskdef.json (deflated 60%)
adding: appspec.yaml (deflated 29%)
# aws s3 cp zip artifacts.zip s3://test-bucket/test-nginx/
upload: artifacts.zip to s3://test-bucket/test-nginx/artifacts.zip
CodeDeployの準備
- CloudFormationやコンソール画面で、ECSのBLUE/GreenデプロイするCodeDeployを作成する方法はまだないので、aws-cliで作成します
※ CloudFormationを経由する方法はリリースしました
アプリケーションの作成
# aws deploy create-application \
--application-name test-nginx-app \
--compute-platform ECS \
--region ap-northeast-1
{
"applicationId": "12345678-1234-4321-5678-1234abcd4321"
}
- 以下のjsonファイルを準備します
- 基本は先ほど先ほどCloudFormationで作ったリソースを入れるだけです
- CodeDeployの
ECS
用のロールを事前に準備してください -
listenerArns
はターゲットグループがアタッチされているリスナーのArnを入れます- リスナールールではありません
{
"applicationName": "test-nginx-app",
"autoRollbackConfiguration": {
"enabled": true,
"events": [
"DEPLOYMENT_FAILURE",
"DEPLOYMENT_STOP_ON_REQUEST"
]
},
"blueGreenDeploymentConfiguration": {
"deploymentReadyOption": {
"actionOnTimeout": "CONTINUE_DEPLOYMENT",
"waitTimeInMinutes": 0
},
"terminateBlueInstancesOnDeploymentSuccess": {
"action": "TERMINATE",
"terminationWaitTimeInMinutes": 60
}
},
"deploymentGroupName": "test-nginx",
"deploymentStyle": {
"deploymentOption": "WITH_TRAFFIC_CONTROL",
"deploymentType": "BLUE_GREEN"
},
"loadBalancerInfo": {
"targetGroupPairInfoList": [
{
"targetGroups": [
{
"name": "test-nginx-1"
},
{
"name": "test-nginx-2"
}
],
"prodTrafficRoute": {
"listenerArns": [
"arn:aws:elasticloadbalancing:ap-northeast-1:1234567890:listener-rule/app/test-nginx/611c1xxxxxx488/417dxxxxxxf2853/f2dbdxxxxxxx62cf"
]
}
}
]
},
"serviceRoleArn": "arn:aws:iam::1234567890:role/CodeDeployRoleForEcs",
"ecsServices": [
{
"serviceName": "test-nginx",
"clusterName": "test-nginx-cluster"
}
]
}
- CodeDeployのデプロイグループを作成します
# aws deploy create-deployment-group \
--cli-input-json file://CodeDeploy.json \
--region ap-northeast-1
{
"deploymentGroupId": "12345678-1234-4321-5678-1234abcd4321"
}
CodePipeline
- 以下テンプレートを利用して、CloudFormationでスタックを作成し、
CodePipeline
を構築します
`CodePipeline.yaml`
AWSTemplateFormatVersion: "2010-09-09"
Description: The template for CodePipeline
Parameters:
Service:
Type: String
Description: Service name
S3Bucket:
Description: The name of the S3 bucket that contains the source artifact, which must be in the same region as this stack
Type: String
SourceS3Key:
Description: The file name of the source artifact, such as myfolder/myartifact.zip
Type: String
SnsTopic:
Description: The SNS Topic where CodePipeline sends pipeline notifications
Type: String
DetailType:
Description: The level of detail to include in the notifications for this resource.
Type: String
Default: FULL
AllowedValues:
- FULL
- BASIC
Status:
Description: The status of the notification rule.
Type: String
Default: ENABLED
AllowedValues:
- ENABLED
- DISABLED
RepositoryName:
Description: The ECR Repository name.
Type: String
ApplicationName:
Description: The CodeDeploy Application name.
Type: String
DeploymentGroupName:
Description: The CodeDeploy DeploymentGroup name.
Type: String
TaskDefinitionTemplatePath:
Description: The TaskDefinition template file path in source artifact.
Type: String
Default: taskdef.json
AppSpecTemplatePath:
Description: The AppSpec template file path in source artifact.
Type: String
Default: appspec.yaml
Metadata:
AWS::CloudFormation::Interface:
ParameterGroups:
- Label:
default: "CodePipeline Settings"
Parameters:
- Service
- S3Bucket
- SourceS3Key
- RepositoryName
- ApplicationName
- DeploymentGroupName
- TaskDefinitionTemplatePath
- AppSpecTemplatePath
- Label:
default: "NotificationRule Settings"
Parameters:
- DetailType
- Status
- SnsTopic
Resources:
ArtifactStoreBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: !Sub "${Service}-codepipeline-bucket"
VersioningConfiguration:
Status: Enabled
LifecycleConfiguration:
Rules:
- Id: "DeleteObject"
AbortIncompleteMultipartUpload:
DaysAfterInitiation: 7
NoncurrentVersionExpirationInDays: 5
ExpirationInDays: 90
Status: Enabled
Pipeline:
Type: AWS::CodePipeline::Pipeline
Properties:
ArtifactStore:
Location: !Ref "ArtifactStoreBucket"
Type: S3
DisableInboundStageTransitions: []
Name: !Sub "${Service}-pipeline"
RoleArn: !GetAtt [PipelineRole, Arn]
Stages:
- Name: GetSource
Actions:
- Name: S3Source
ActionTypeId:
Category: Source
Owner: AWS
Provider: S3
Version: "1"
Configuration:
S3Bucket: !Ref S3Bucket
S3ObjectKey: !Ref SourceS3Key
PollForSourceChanges: false
OutputArtifacts:
- Name: SourceArtifact
RunOrder: 1
- Name: ImageSource
ActionTypeId:
Version: "1"
Owner: AWS
Category: Source
Provider: ECR
OutputArtifacts:
- Name: ImageArtifact
RunOrder: 2
Configuration:
ImageTag: latest
RepositoryName: !Ref RepositoryName
- Name: DeployToEcs
Actions:
- Name: Deploy
ActionTypeId:
Category: Deploy
Owner: AWS
Provider: CodeDeployToECS
Version: "1"
RunOrder: 1
Configuration:
AppSpecTemplateArtifact: SourceArtifact
ApplicationName: !Ref ApplicationName
DeploymentGroupName: !Ref DeploymentGroupName
Image1ArtifactName: ImageArtifact
Image1ContainerName: ECR_IMAGE
TaskDefinitionTemplatePath: !Ref TaskDefinitionTemplatePath
AppSpecTemplatePath: !Ref AppSpecTemplatePath
TaskDefinitionTemplateArtifact: SourceArtifact
InputArtifacts:
- Name: SourceArtifact
- Name: ImageArtifact
Region: !Ref AWS::Region
PipelineRole:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub "${Service}-pipeline-role"
AssumeRolePolicyDocument:
Statement:
- Action: ['sts:AssumeRole']
Effect: Allow
Principal:
Service: [codepipeline.amazonaws.com]
Version: '2012-10-17'
Path: /
Policies:
- PolicyName: CodePipelineAccess
PolicyDocument:
Version: '2012-10-17'
Statement:
- Action:
- "s3:*"
- "codedeploy:CreateDeployment"
- "codedeploy:GetDeployment"
- "codedeploy:GetApplication"
- "codedeploy:GetApplicationRevision"
- "codedeploy:RegisterApplicationRevision"
- "codedeploy:GetDeploymentConfig"
- "ecs:RegisterTaskDefinition"
- "ecr:DescribeImages"
- "sns:Publish"
Effect: Allow
Resource: "*"
- Action:
- "iam:PassRole"
Effect: Allow
Resource: "*"
Condition:
StringEqualsIfExists:
'iam:PassedToService':
- cloudformation.amazonaws.com
- elasticbeanstalk.amazonaws.com
- ec2.amazonaws.com
- ecs-tasks.amazonaws.com
NotificationRule:
Type: AWS::CodeStarNotifications::NotificationRule
Properties:
DetailType: !Ref DetailType
EventTypeIds:
- codepipeline-pipeline-pipeline-execution-failed
- codepipeline-pipeline-pipeline-execution-canceled
- codepipeline-pipeline-pipeline-execution-started
- codepipeline-pipeline-pipeline-execution-resumed
- codepipeline-pipeline-pipeline-execution-succeeded
- codepipeline-pipeline-pipeline-execution-superseded
Name: !Sub "${Service}-pipeline-notification-rule"
Resource: !Sub "arn:aws:codepipeline:${AWS::Region}:${AWS::AccountId}:${Service}-pipeline"
Status: !Ref Status
Targets:
- TargetType: SNS
TargetAddress: !Ref SnsTopic
デプロイ
-
クラスター画面で確認します
- CodeDeployに1時間後に前のリソースを削除する設定をしましたので、削除待ちじょうたいになりました
最後に
システム自動化により、手運用とさようなら
永遠に働けるエンジニアがいなければ、永遠に動けるシステムも存在しない
人は楽に生きたいですねw
自動化昔話
昔々、運用でカバーウサギと運用自動化カメがいます。
ある日、ウサギとカメはどちらの仕事効率が一番いいのか、たくさんのシステムを作って競争しました。
運用でカバーウサギはとりあえず動けばいいだと思い、次から次へとシステムを完成しました。
一方、運用自動化カメは最初からシステム運用の自動化を図り、ドキュメントもたくさん残っていたが、完成するまでの時間はウサギより長くなりました。
しかし、IaCや自動化のテンプレートもゆっくり積んでいるので、仕事の効率は淡々と速くなりました。
ところが、運用でカバーウサギは運用の仕事が属人化になり、手順書も残っていないため、ほかの人に任せることはできなくて、運用に多大なコストをかかりました。
あの日、仕事の効率がカメより悪くなったウサギは、速く次の仕事を完成させるためにとても集中していたが、今までの運用がおろそかになり、大きな障害が起きてしまいました。
障害のパニックで再起不能になったウサギは負けを認めました。
結果、運用自動化カメの圧勝でありました。
めでたしめでたし。