3
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

AWS EventBridgeスケジューラで圧倒的コスト削減に成功した話

Last updated at Posted at 2024-05-22

概要

前回の記事でご紹介した「EventBridge」を使って、3割ほどのコスト削減に成功したので残しておきます!

AWSを使ったインフラのコスト削減をしたいときは、ぜひご参照ください。

前回の記事はこちら↓

AWS EventBridgeスケジューラが便利すぎた!

対象リソース

コスト削減と銘打ってはいますが、ECSコンテナなどのスケーリングでも、積極的なリソースコントロールができるのでオススメです!

今回対象としたawsリソースは、以下の通り。

  • EC2
  • ECS
  • RDS
  • VPC endpoint
  • Cloudwatchアラーム(エラーになるため無効にする)
  • route53ヘルスチェック(エラーになるため無効にする)

terraformで作成してみた

IAM

まずはIAM設定。お試しなのでポリシーゆるめです。

#-----------------------------------------------------------------------------
# IAM
#-----------------------------------------------------------------------------
resource "aws_iam_role" "start_stop_ec2" {
  name               = "${var.app_name}-${var.env}-start-stop-ec2-role"
  assume_role_policy = jsonencode({
    Version   = "2012-10-17"
    Statement = [
      {
        Effect    = "Allow"
        Principal = { Service = "scheduler.amazonaws.com" }
        Action    = "sts:AssumeRole"
      }
    ]
  })
  managed_policy_arns = [
    "arn:aws:iam::aws:policy/AmazonEC2FullAccess"
  ]
}

resource "aws_iam_role" "start_stop_ecs" {
  name               = "${var.app_name}-${var.env}-start-stop-ecs-role"
  assume_role_policy = jsonencode({
    Version   = "2012-10-17"
    Statement = [
      {
        Effect    = "Allow"
        Principal = { Service = "scheduler.amazonaws.com" }
        Action    = "sts:AssumeRole"
      }
    ]
  })
  managed_policy_arns = [
    "arn:aws:iam::aws:policy/AmazonECS_FullAccess",
    "arn:aws:iam::aws:policy/AmazonRoute53FullAccess",
    "arn:aws:iam::aws:policy/CloudWatchFullAccess",
    "arn:aws:iam::aws:policy/AWSLambda_FullAccess",
    "arn:aws:iam::aws:policy/AmazonEC2FullAccess"
  ]
}

resource "aws_iam_role" "start_stop_rds" {
  name               = "${var.app_name}-${var.env}-start-stop-rds-role"
  assume_role_policy = jsonencode({
    Version   = "2012-10-17"
    Statement = [
      {
        Effect    = "Allow"
        Principal = { Service = "scheduler.amazonaws.com" }
        Action    = "sts:AssumeRole"
      }
    ]
  })
  managed_policy_arns = [
    "arn:aws:iam::aws:policy/AmazonRDSFullAccess"
  ]
}

resource "aws_iam_role" "lambda_costdown_role" {
  name = "${var.app_name}-${var.env}-lambda_costdown_role"

  assume_role_policy = jsonencode({
    Version   = "2012-10-17"
    Statement = [
      {
        Effect    = "Allow"
        Principal = { Service = "lambda.amazonaws.com" }
        Action    = "sts:AssumeRole"
      }
    ]
  })
}

resource "aws_iam_policy" "lambda_costdown_policy" {
  name        = "${var.app_name}-${var.env}-lambda-costdown-policy"

  policy = jsonencode({
    Version   = "2012-10-17"
    Statement = [
      {
        Effect   = "Allow"
        Action   = [
          "logs:CreateLogGroup",
          "logs:CreateLogStream",
          "logs:PutLogEvents",
          "ec2:*"
        ]
        Resource = [
          aws_cloudwatch_log_group.lambda_costdown.arn,
          "${aws_cloudwatch_log_group.lambda_costdown.arn}:*"
        ]
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "lambda_policy_attachment" {
  role       = aws_iam_role.lambda_costdown_role.name
  policy_arn = aws_iam_policy.lambda_costdown_policy.arn
}

EC2

踏み台サーバの自動停止起動用。夜22時に自動で停止して朝8時に起動します。

resource "aws_scheduler_schedule" "stop_bastion" {
  name = "${var.app_name}-${var.env}-stop-bastion"
  schedule_expression = "cron(0 13 * * ? *)" // 22:00 JST

  flexible_time_window {
    mode = "OFF"
  }

  target {
    arn      = "arn:aws:scheduler:::aws-sdk:ec2:stopInstances"
    role_arn = aws_iam_role.start_stop_ec2.arn

    input = jsonencode({
      InstanceIds = [var.bastion_id]
    })
  }
}

// bastionサーバの自動起動
resource "aws_scheduler_schedule" "start_bastion" {
  name = "${var.app_name}-${var.env}-start-bastion"
  schedule_expression = "cron(0 23 * * ? *)" // 08:00 JST

  flexible_time_window {
    mode = "OFF"
  }

  target {
    arn      = "arn:aws:scheduler:::aws-sdk:ec2:startInstances"
    role_arn = aws_iam_role.start_stop_ec2.arn

    input = jsonencode({
      InstanceIds = [var.bastion_id]
    })
  }
}

ECS

この例では対象時間帯にタスクを0にする設定です。(0にするとアクセスできなくなるので注意。)

サービスを落としたくない場合は1以上を指定してください。

// ECSタスクの自動停止
locals {
  stop_ecs_schedule = "cron(0 13 * * ? *)" // 22:00 JST
  start_ecs_schedule = "cron(0 23 * * ? *)" // 08:00 JST
}
resource "aws_scheduler_schedule" "stop_ecs_web" {
  name = "${var.app_name}-${var.env}-stop-ecs-web"
  schedule_expression = local.stop_ecs_schedule

  flexible_time_window {
    mode = "OFF"
  }

  target {
    arn      = "arn:aws:scheduler:::aws-sdk:ecs:updateService"
    role_arn = aws_iam_role.start_stop_ecs.arn

    input = jsonencode({
      Cluster = var.ecs_cluster_name,
      Service = var.web_service_name,
      DesiredCount = 0
    })
  }
}

// ECSタスクの自動起動
resource "aws_scheduler_schedule" "start_ecs_web" {
  name = "${var.app_name}-${var.env}-start-ecs-web"
  schedule_expression = local.start_ecs_schedule

  flexible_time_window {
    mode = "OFF"
  }

  target {
    arn      = "arn:aws:scheduler:::aws-sdk:ecs:updateService"
    role_arn = aws_iam_role.start_stop_ecs.arn

    input = jsonencode({
      Cluster = var.ecs_cluster_name,
      Service = var.web_service_name,
      DesiredCount = 2
    })
  }
}

RDS

Auroraを想定しています。Auroraではない場合はtaget.arnの変更が必要です。

Clusterをダウンさせればインスタンスも全てダウンします。

// RDSの自動停止
locals {
  stop_rds_schedule = "cron(0 13 * * ? *)" // 22:00 JST
  start_rds_schedule = "cron(0 23 * * ? *)" // 08:00 JST
}
resource "aws_scheduler_schedule" "stop_rds" {
  name = "${var.app_name}-${var.env}-stop-rds"
  schedule_expression = local.stop_rds_schedule

  flexible_time_window {
    mode = "OFF"
  }

  target {
    arn      = "arn:aws:scheduler:::aws-sdk:rds:stopDBCluster"
    role_arn = aws_iam_role.start_stop_rds.arn

    input = jsonencode({
      DbClusterIdentifier = var.read_db_cluster_id
    })
  }
}

// RDSの自動起動
resource "aws_scheduler_schedule" "start_rds" {
  name = "${var.app_name}-${var.env}-start-rds"
  schedule_expression = local.start_rds_schedule

  flexible_time_window {
    mode = "OFF"
  }

  target {
    arn      = "arn:aws:scheduler:::aws-sdk:rds:startDBCluster"
    role_arn = aws_iam_role.start_stop_rds.arn

    input = jsonencode({
      DbClusterIdentifier = var.read_db_cluster_id
    })
  }
}

VPC endpoint

環境ごとに用意していると、意外と料金の嵩むVPCendpointです。

AWS EventBridgeスケジューラが便利すぎた!」でも紹介している通り、EventBridge単体では作成→削除を実現できないので、Lambdaも使います。

なお、S3のエンドポイントは料金かからないので除外しています。

resource "aws_scheduler_schedule" "delete_vpc_endpoint" {
  name = "${var.app_name}-${var.env}-delete-vpc-endpoint"
  schedule_expression = local.stop_ecs_schedule

  flexible_time_window {
    mode = "OFF"
  }

  target {
    arn      = "arn:aws:scheduler:::aws-sdk:lambda:invoke"
    role_arn = aws_iam_role.start_stop_ecs.arn

    input = jsonencode({
      FunctionName   = "${var.app_name}-${var.env}-delete-vpc-endpoint",
      InvocationType = "Event",
      Payload        = jsonencode({
        ServiceEnv = "${var.app_name}-${var.env}"
      })
    })
  }
}

resource "aws_scheduler_schedule" "create_interface_endpoint" {
  for_each = toset([
    "ecr.dkr",
    "logs",
    "ecr.api",
    "ssmmessages",
  ])

  name = "${var.app_name}-${var.env}-create-interface-endpoint-${each.value}"
  schedule_expression = local.start_ecs_schedule

  flexible_time_window {
    mode = "OFF"
  }

  target {
    arn      = "arn:aws:scheduler:::aws-sdk:ec2:createVpcEndpoint"
    role_arn = aws_iam_role.start_stop_ecs.arn

    input = jsonencode({
      VpcId = var.vpc_id,
      ServiceName = "com.amazonaws.ap-northeast-1.${each.value}",
      SubnetIds = var.private_subnet_ids,
      SecurityGroupIds = [
        var.sg_endpoint_id
      ],
      VpcEndpointType = "Interface",
      TagSpecifications = [{
        ResourceType = "vpc-endpoint",
        Tags = [{
          Key = "Name",
          Value = "${var.app_name}-${var.env}-vpc-endpoint-${each.value}"
        }]
      }]
    })
  }
}

#-----------------------------------------------------------------------------
# Lambda
#-----------------------------------------------------------------------------
data "archive_file" "delete_vpc_endpoint" {
  type        = "zip"
  source_dir  = "../../../../py/delete_vpc_endpoint"
  output_path = "../../../../py/delete_vpc_endpoint.zip"
}

resource "aws_lambda_function" "delete_vpc_endpoint" {
  function_name    = "${var.app_name}-${var.env}-delete-vpc-endpoint"
  handler          = "main.lambda_handler"
  runtime          = "python3.10"
  filename         = data.archive_file.delete_vpc_endpoint.output_path
  source_code_hash = filebase64sha256(data.archive_file.delete_vpc_endpoint.output_path)
  role             = aws_iam_role.lambda_costdown_role.arn
  timeout          = 30
}

もし、初期構築時に別途VPCエンドポイントを作成していると、ドリフト検出してしまうので、初期構築時の方をコメントアウトしたり、はじめから上記の設定で作成したりしてください。

以下、呼び出すLambdaの関数です。

import boto3

def lambda_handler(event, context):
    """
    VPCエンドポイントを削除する
    """
    print(f"delete_vpc_endpointを実行します。")

    # VPCエンドポイントのリストを取得
    client = boto3.client('ec2')
    response = client.describe_vpc_endpoints()
    endpoints = response['VpcEndpoints']

    # envでフィルタリング
    filtered_endpoints = []
    for endpoint in endpoints:
        if 'Interface' == endpoint['VpcEndpointType'] and event['ServiceEnv'] in endpoint['Tags'][0]['Value']:
            filtered_endpoints.append(endpoint)

    # エンドポイントを削除
    for endpoint in filtered_endpoints:
        try:
            client.delete_vpc_endpoints(VpcEndpointIds=[endpoint['VpcEndpointId']])
            print(f"エンドポイント {endpoint['Tags'][0]['Value']} を削除しました")
        except Exception as e:
            print(f"エンドポイント {endpoint['Tags'][0]['Value']} の削除中にエラーが発生しました: {str(e)}")
            print(f"delete_vpc_endpointを異常終了します。")

    print(f"delete_vpc_endpointを正常終了します。")
    return {
        'statusCode': 200
    }

引数にServiceEnvを指定していて、それを使ってtagをフィルタリングしています。ServiceEnvは、EventBridgeから渡しています。

Cloudwatchアラーム

ECS周りでアラームを設定している場合、それらを無効化しておきます。(アラームによっては、必ずしも無効化しなくても発報されないので、設定に合わせてください。)

resource "aws_scheduler_schedule" "stop_ecs_task_alerm" {
  name = "${var.app_name}-${var.env}-stop-ecs-task-alerm"
  schedule_expression = local.stop_ecs_schedule

  flexible_time_window {
    mode = "OFF"
  }

  target {
    arn      = "arn:aws:scheduler:::aws-sdk:cloudwatch:disableAlarmActions"
    role_arn = aws_iam_role.start_stop_ecs.arn

    input = jsonencode({
      AlarmNames = [
        var.cloudwatch_web_task_name,
        var.cloudwatch_worker_task_name,
        var.cloudwatch_sftp_task_name
      ]
    })
  }
}

resource "aws_scheduler_schedule" "start_ecs_task_alerm" {
  name = "${var.app_name}-${var.env}-start-ecs-task-alerm"
  schedule_expression = local.start_alerm_schedule

  flexible_time_window {
    mode = "OFF"
  }

  target {
    arn      = "arn:aws:scheduler:::aws-sdk:cloudwatch:enableAlarmActions"
    role_arn = aws_iam_role.start_stop_ecs.arn

    input = jsonencode({
      AlarmNames = [
        var.cloudwatch_web_task_name,
        var.cloudwatch_worker_task_name,
        var.cloudwatch_sftp_task_name,
        var.cloudwatch_web_memory_high_name
      ]
    })
  }
}

上記では、自動で、タスクが0になった際に発報されるアラームの無効有効を切り替えています。

Route53ヘルスチェック

外型監視用のヘルスチェックを自動で無効有効切り替えます。

resource "aws_scheduler_schedule" "stop_health_check" {
  name = "${var.app_name}-${var.env}-stop-health-check"
  schedule_expression = local.stop_ecs_schedule

  flexible_time_window {
    mode = "OFF"
  }

  target {
    arn      = "arn:aws:scheduler:::aws-sdk:route53:updateHealthCheck"
    role_arn = aws_iam_role.start_stop_ecs.arn

    input = jsonencode({
      HealthCheckId  = var.health_check_id,
      Disabled       = true
    })
  }
}

resource "aws_scheduler_schedule" "start_health_check" {
  name = "${var.app_name}-${var.env}-start-health-check"
  schedule_expression = local.start_alerm_schedule

  flexible_time_window {
    mode = "OFF"
  }

  target {
    arn      = "arn:aws:scheduler:::aws-sdk:route53:updateHealthCheck"
    role_arn = aws_iam_role.start_stop_ecs.arn

    input = jsonencode({
      HealthCheckId  = var.health_check_id,
      Disabled       = false
    })
  }
}

結果

やはりECSやRDSの削減効果が大きいです!もちろん運用上の条件によりますが、上記コスト削減を実施することで3割くらいは削減できる印象です。

日中負荷が高くなる時間帯が明白であれば、その時間帯だけスケールアウトさせることもできるのでオススメ!!

また、コスト削減目的なら、Reserved InstancesやSavings Plans、Fargate Spotの利用検討もしてみてください。

Fargate Spotは、意外と知名度が低い印象ですので、また別の記事で紹介したいと思います。

3
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
3
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?