本当はAWS BackupでAMIを定期取得して世代管理を実装したかったのですが下記理由により断念しました。
- スケジューリングした際、●●時●●分に取得開始という設定ができない
- AMI取得の際、「再起動あり」の設定ができない
解消する方法は色々あったのですが、今回は先人のノウハウを参考にEventBridge+LambdaでAMIを定期取得して世代管理を実装するCloudFormationテンプレートを作成しました。
今回参考にさせていただいた記事
かなり参考にさせていただいたので本記事の不明点はこちらの記事を参照いただけるとよいかと思います。
CloudFormationテンプレートによる実装概要
CloudFormationテンプレートでスタックを作成すると下記のような構成が出来上がります。CloudTrailについてはCloudFormationテンプレートに含んでいませんので別途有効化が必要となります。
この構成は下記2つのCloudFormationテンプレートで構築しています。
- AMI定期取得用CloudFormationテンプレート(schedule_create-ami-generations.yml)
- AMI登録削除後に対象AMIのスナップショット削除用CloudFormationテンプレート(event_ec2-snapshot-delete-after-ec2-ami-deregister.yml)
- CloudTrail有効化前提
AMI定期取得用CloudFormationテンプレート(schedule_create-ami-generations.yml)
CloudFormationのスタック作成時に決定したEC2タグ(初期値:Create-ami-generations)が付いているEC2のAMIを定期的にバックアップするものです。
EventBridgeで指定した時間にAMI取得を行うLambda関数を呼び出します。Lambdaでは下記処理を行います
- 指定のEC2タグ(初期値:Create-ami-generations)があるEC2のAMIを取得
- 指定のEC2タグ(初期値:Create-ami-generations)の値の世代管理を実施
- 古いEC2についてはAMI登録解除(DeregisterImage) ※AMI取得時のスナップショットはもう一つのLambdaで削除
AMI定期取得用CloudFormationテンプレート(schedule_create-ami-generations.yml)の内容は下記となります。
CloudFormationスタック作成時に入力が必要となるパラメータは下記のとおりです。
- EC2TagName : AMI定期取得対象となるEC2タグ名 (初期値:Create-ami-generations)
- EventName :Event名(初期値:Create-ami-generations_schedule)
- FunctionName:Lambda関数名(初期値:Create-ami-generations)
- NoReboot:AMI取得時の「再起動しない」の指定(初期値:false) ※初期は「再起動する」としています
- RoleName : Lambda関数で利用するIAMロール名(初期値:Create-ami-generations)
- Schedule : EventBridgeのcron指定(初期値:cron(0 18 * * ? *)) ※03:00(JST)
- UTCで設定する事。詳細はルールのスケジュール式を参照
AWSTemplateFormatVersion: 2010-09-09
Parameters:
RoleName:
Type: String
Default: Create-ami-generations
EC2TagName:
Type: String
Default: Create-ami-generations
FunctionName:
Type: String
Default: Create-ami-generations
NoReboot:
Type: String
Default: False
EventName:
Type: String
Default: Create-ami-generations_schedule
Schedule:
Type: String
Default: cron(0 18 * * ? *)
Resources:
LambdaRole:
Type: 'AWS::IAM::Role'
Properties:
RoleName: !Sub '${RoleName}'
AssumeRolePolicyDocument:
Statement:
- Action:
- 'sts:AssumeRole'
Effect: Allow
Principal:
Service:
- lambda.amazonaws.com
Version: 2012-10-17
ManagedPolicyArns:
- 'arn:aws:iam::aws:policy/AWSLambdaExecute'
- 'arn:aws:iam::aws:policy/AmazonEC2FullAccess'
Path: /
Lambda:
Type: 'AWS::Lambda::Function'
DependsOn: LambdaRole
Properties:
Code:
ZipFile: |
import os
import boto3
import distutils.util
import collections
from time import sleep
from datetime import datetime, timedelta
from botocore.client import ClientError
from logging import getLogger, INFO
logger = getLogger()
ec2_client = boto3.client('ec2')
def lambda_handler(event, context):
descriptions = create_image()
delete_old_images(descriptions)
def create_image():
instances = get_instances([os.environ['TAG_NAME']])
descriptions = {}
for instance in instances:
tags = { tag['Key']: tag['Value'] for tag in instance['Tags'] }
generation = int( tags.get(os.environ['TAG_NAME'], 0) )
if generation < 1:
continue
no_reboot = bool( distutils.util.strtobool( os.environ['NO_REBOOT'] ) )
instance_id = instance.get('InstanceId')
create_data_jst = (datetime.now() + timedelta(hours=9)).strftime("%Yy%mm%dd_%Hh%Mm%Ss")
ami_name = '%s_%s' % (tags['Name'], instance_id)
ami_name = ami_name + "_" + create_data_jst
description = instance_id
image_id = _create_image(instance_id, ami_name, description, no_reboot)
logger.info('Create Image: ImageId:%s (%s) ' % (image_id['ImageId'], ami_name))
print('Create Image: ImageId:%s (%s) ' % (image_id['ImageId'], ami_name))
descriptions[description] = generation
return descriptions
def get_instances(tag_names):
reservations = ec2_client.describe_instances(
Filters=[
{
'Name': 'tag-key',
'Values': tag_names
}
]
)['Reservations']
return sum([
[instance for instance in reservation['Instances']]
for reservation in reservations
], [])
def _create_image(instance_id, ami_name, description, no_reboot):
for i in range(1, 3):
try:
return ec2_client.create_image(
Description = description,
NoReboot = no_reboot,
InstanceId = instance_id,
Name = ami_name
)
except ClientError as e:
logger.exception(str(e))
print(str(e))
sleep(2)
raise Exception('cannot create image ' + ami_name)
def delete_old_images(descriptions):
images_descriptions = get_images_descriptions(list(descriptions.keys()))
for description, images in images_descriptions.items():
delete_count = len(images) - descriptions[description]
if delete_count <= 0:
continue
images.sort(key=lambda x:x['CreationDate'])
old_images = images[0:delete_count]
for image in old_images:
_deregister_image(image['ImageId'])
logger.info('Deregister Image: ImageId:%s (%s)' % (image['ImageId'], image['Description']))
print('Deregister Image: ImageId:%s (%s)' % (image['ImageId'], image['Description']))
def get_images_descriptions(descriptions):
images = ec2_client.describe_images(
Owners = [
os.environ['AWS_ACCOUNT']
],
Filters = [
{
'Name': 'description',
'Values': descriptions,
}
]
)['Images']
groups = collections.defaultdict(lambda: [])
{ groups[ image['Description'] ].append(image) for image in images }
return groups
def _deregister_image(image_id):
for i in range(1, 3):
try:
return ec2_client.deregister_image(
ImageId = image_id
)
except ClientError as e:
logger.exception(str(e))
print(str(e))
sleep(2)
raise Exception('Cannot Deregister image: ' + image_id)
Description: !Sub EC with the EC2 tag " ${EC2TagName} = n (number of generations) " Create a 2-instance image (AMI+Snapshot) on a regular basis.
FunctionName: !Sub ${FunctionName}
Handler: index.lambda_handler
MemorySize: 128
Role: !Sub 'arn:aws:iam::${AWS::AccountId}:role/${RoleName}'
Runtime: python3.6
Timeout: 300
Environment:
Variables:
'AWS_ACCOUNT': !Sub ${AWS::AccountId}
'NO_REBOOT': !Sub ${NoReboot}
'TAG_NAME': !Sub ${EC2TagName}
Tags:
- Key: Name
Value: !Sub ${FunctionName}
- Key: CloudformationArn
Value: !Ref 'AWS::StackId'
Rule:
Type: 'AWS::Events::Rule'
Properties:
Description: !Sub ${EventName}
Name: !Sub ${EventName}
ScheduleExpression: !Sub '${Schedule}'
State: ENABLED
Targets:
- Arn: !GetAtt
- Lambda
- Arn
Id: lambda
LambdaEvent:
Type: 'AWS::Lambda::Permission'
Properties:
Action: 'lambda:InvokeFunction'
FunctionName: !Ref Lambda
Principal: events.amazonaws.com
SourceArn: !GetAtt
- Rule
- Arn
AMI登録削除後に対象AMIのスナップショット削除用CloudFormationテンプレート(event_ec2-snapshot-delete-after-ec2-ami-deregister.yml)
AMI登録解除時にDeregisterImageというAWS API Call が呼び出されたときにEventBridge経由にてAMI登録解除対象のスナップショット削除するものです。Lambdaでは下記の処理を行います。
- AMI登録解除対象のスナップショット削除
AMI定期取得用CloudFormationテンプレート(schedule_create-ami-generations.yml)の内容は下記となります。
CloudFormationスタック作成時に入力が必要となるパラメータを説明します。
- EventName :Event名(初期値:Delete-ec2-snapshot-after-ami-deregister_event)
- FunctionName:Lambda関数名(初期値:Delete-ec2-snapshot-after-ami-deregister)
- RoleName : Lambda関数で利用するIAMロール名(初期値:Delete-ec2-snapshot-after-ami-deregister)
AWSTemplateFormatVersion: 2010-09-09
Parameters:
RoleName:
Type: String
Default: Delete-ec2-snapshot-after-ami-deregister
FunctionName:
Type: String
Default: Delete-ec2-snapshot-after-ami-deregister
EventName:
Type: String
Default: Delete-ec2-snapshot-after-ami-deregister_event
Resources:
LambdaRole:
Type: 'AWS::IAM::Role'
Properties:
RoleName: !Sub '${RoleName}'
AssumeRolePolicyDocument:
Statement:
- Action:
- 'sts:AssumeRole'
Effect: Allow
Principal:
Service:
- lambda.amazonaws.com
Version: 2012-10-17
ManagedPolicyArns:
- 'arn:aws:iam::aws:policy/AWSLambdaExecute'
- 'arn:aws:iam::aws:policy/AmazonEC2FullAccess'
Path: /
Lambda:
Type: 'AWS::Lambda::Function'
DependsOn: LambdaRole
Properties:
Code:
ZipFile: |
import os
import boto3
from logging import getLogger, INFO
from time import sleep
from botocore.exceptions import ClientError
logger = getLogger()
logger.setLevel(INFO)
client = boto3.client('ec2')
def lambda_handler(event, context):
detail = event['detail']
requestParameters = detail['requestParameters']
imageID = requestParameters['imageId']
response = client.describe_snapshots(
OwnerIds = [
os.environ['AWS_ACCOUNT']
],
Filters = [
{
'Name': 'description',
'Values': [ 'Created by CreateImage(*) for ' + imageID + ' from *' ]
}
]
)
for snapshot in response['Snapshots']:
logger.info(imageID)
logger.info("delete_snapshot: " + snapshot['SnapshotId'])
print("delete_snapshot: " + snapshot['SnapshotId'])
_delete_snapshot(snapshot['SnapshotId'])
def _delete_snapshot(snapshotid):
try:
return client.delete_snapshot(SnapshotId=snapshotid)
except ClientError as e:
logger.exception("Received error: %s", e)
sleep(2)
Description: Once the AMI has been unregistered (the AWS API Call on DeregisterImage is Lambda to delete all snapshots associated with that AMI ID (if called). Execute function delete_snapshot_after_ami_deregister.
FunctionName: !Sub ${FunctionName}
Handler: index.lambda_handler
MemorySize: 128
Role: !Sub 'arn:aws:iam::${AWS::AccountId}:role/${RoleName}'
Runtime: python3.6
Timeout: 300
Environment:
Variables:
'AWS_ACCOUNT': !Sub ${AWS::AccountId}
Tags:
- Key: Name
Value: !Sub ${RoleName}
- Key: CloudformationArn
Value: !Ref 'AWS::StackId'
Rule:
Type: 'AWS::Events::Rule'
Properties:
Description: !Sub ${EventName}
Name: !Sub ${EventName}
EventPattern:
source:
- "aws.ec2"
detail-type:
- "AWS API Call via CloudTrail"
detail:
eventSource:
- "ec2.amazonaws.com"
eventName:
- "DeregisterImage"
State: "ENABLED"
Targets:
- Arn: !GetAtt
- Lambda
- Arn
Id: lambda
LambdaEvent:
Type: 'AWS::Lambda::Permission'
Properties:
Action: 'lambda:InvokeFunction'
FunctionName: !Ref Lambda
Principal: events.amazonaws.com
SourceArn: !GetAtt
- Rule
- Arn