More than 3 years have passed since last update.

AWSリソースの自動起動停止について

Last updated at 2022-05-08Posted at 2022-05-08

考えたこと

拡張性
- 自動起動停止したいサーバーは増えていくだろうし、状況に応じて設定を一時解除したりもしたい。そのため、起動停止の設定を簡単に追加削除できるのが望ましい。
- さまざまなリソースの起動停止(構築削除)を同じ仕組みで扱いたい。
エラーについて
- 開発環境で利用するのが主な用途なので緊急性はあまりないが、サーバーが起動していないことに気づくのが遅れると、開発に影響が出る。そのため、エラー時は通知がほしい。
- 起動停止する処理は各々が独立している方が良い。
  - for文で回すと、途中のエラーで後続のリソースが処理できない可能性がある。
  - イベントルールを増やした方が良さそう。
管理
- どのサーバー等が起動停止の対象になっているのかすぐにわかるようにしたい。
  - リソースはTerraformで管理したい。

例

Lambdaでやる場合

自動起動停止等、AWSリソースを操作するLambdaを作成する。
Amazon EventBridgeでルールを作成して、上記Lambdaを定期実行できるようにする。

必要なもの	説明
Lambda	処理を実行する関数
IAM Role	操作ソースの権限を付与したLambdaのロール
Event Rule	定期実行用のイベントルール
Lambda Permission	Event RuleからLambdaを呼び出すための権限設定
Cloud Watch Alarm	Lambdaの失敗に反応してSNSを通知
Amazon SNS	Chatbot連携用SNS
AWS Chatbot	失敗時SNSからChatbot経由でSlackに通知がいくようにする

Lambda関数の処理

(TODO)イベントルールごとに対象リソースを扱う。

起動停止を行う

起動

lambda_function.py

import boto3

def lambda_handler(event, context):
    client = boto3.client('ec2')
    response = client.start_instances(
                     InstanceIds=[
                       'InstanceIdを記載'
                     ]
                 )
    print(response)

停止

lambda_function.py

import boto3

def lambda_handler(event, context):
    client = boto3.client('ec2')
    response = client.stop_instances(
                    InstanceIds=[
                      'InstanceIdを記載'
                    ]
                )
    print(response)

AutoScalingGroup

希望する容量を変更する

起動

lambda_function.py

import boto3

def lambda_handler(event, context):
    client = boto3.client('autoscaling')
    response = client.update_auto_scaling_group(
        AutoScalingGroupName='my-autoscaling',
        MinSize=3,
        MaxSize=5,
        DesiredCapacity=3
      )
    print(response)

停止

希望する容量を0にする。
最小キャパシティも0にする。(希望する容量>=最小キャパシティじゃないとエラーになるため)

lambda_function.py

import boto3

def lambda_handler(event, context):
    client = boto3.client('autoscaling')
    response = client.update_auto_scaling_group(
        AutoScalingGroupName='my-autoscaling',
        MinSize=0,
        MaxSize=5,
        DesiredCapacity=0
      )
    print(response)

ECS Fargate

起動

desiredCountを更新する

lambda_function.py

import boto3

def lambda_handler(event, context):
    client = boto3.client('ecs')
    response = client.update_service(
        cluster = 'myapp-cluster',
        service = 'myapp-service',
        desiredCount = 1
    )
    print(response)

停止

lambda_function.py

import boto3

def lambda_handler(event, context):
    client = boto3.client('ecs')
    response = client.update_service(
        cluster = 'myapp-cluster',
        service = 'myapp-service',
        desiredCount = 1
    )
    print(response)

ルール作成など

terraformで作成すれば使い回すことができる
イベントルール、イベントルールからLambdaを呼び出す権限
アラームはLambdaの失敗回数が1回以上なら発火
Chatbotは参考URLのものを利用する

参考(Chatbot)
https://zenn.dev/shonansurvivors/articles/894cae91806052

例)朝8時に起動

example.tf

resource "aws_cloudwatch_event_rule" "start_rule" {
  name                = "test-start"
  description         = "test-start"
  is_enabled          = "true"
  schedule_expression = "cron(00 23 ? * SUN-THU *)"
}

resource "aws_cloudwatch_event_target" "start_target" {
  arn = "Lambda関数のarn"
  rule      = aws_cloudwatch_event_rule.start_rule.name
  target_id = "test-start"
}

resource "aws_lambda_permission" "permission" {
    statement_id  = "AllowExecutionFromCloudWatch"
    action        = "lambda:InvokeFunction"
    function_name = "Lambda関数"
    principal     = "events.amazonaws.com"
    source_arn    = aws_cloudwatch_event_rule.start_rule.arn
}


resource "aws_cloudwatch_metric_alarm" "lambda_test" {
  alarm_name          = "lambda-alarm"
  comparison_operator = "GreaterThanOrEqualToThreshold"
  evaluation_periods  = "1"
  metric_name         = "Errors"
  namespace           = "AWS/Lambda"
  dimensions = {
    FunctionName = "LambdaTest"
  }
  period              = "300"
  statistic           = "Sum"
  threshold           = "1"
  treat_missing_data  = "notBreaching"
  alarm_description   = "自動処理失敗"
  alarm_actions       = [aws_sns_topic.test.arn]
}

resource "aws_sns_topic" "test" {
  name = "test-topic"
}

module "chatbot_slack" {
  source  = "waveaccounting/chatbot-slack-configuration/aws"
  version = "1.1.0"

  configuration_name = "config-name"
  guardrail_policies = ["arn:aws:iam::aws:policy/ReadOnlyAccess"] 
  iam_role_arn       = ""
  logging_level      = "ERROR" 
  slack_channel_id   = ""
  slack_workspace_id = ""
  sns_topic_arns     = [aws_sns_topic.test.arn]
  user_role_required = false 
}

Lambda以外の手段

Lambda以外の手段を考える。

AWS CodeBuildで定期実行

CodeBuildでやる方法のメリットとして、以下がある。
- CodeBuildを一つ作成すれば良い
- Lambdaの書き方を知る必要がない。
- シェルスクリプトで書くこともできる。
- terraform apply, destroyもできる。

用意するもの

ビルド用のリポジトリ
定期実行したいスクリプト、buildspec.yml

イベントルール

イベントルールを作成する
aws_cloudwatch_event_targetのinput:environmentVariablesOverrideで環境変数を上書きして、CodeBuildで実行するスクリプト、引数を記載する。
失敗時のアラームも作成する

例)ECS Fargateの必要数を1にする

example.tf

resource "aws_cloudwatch_event_rule" "build_rule" {
  name                = "test-start"
  description         = "test-start"
  is_enabled          = "true"
  schedule_expression = "cron(00 23 ? * SUN-THU *)"
}

resource "aws_cloudwatch_event_target" "build_target" {
  arn =     "buildプロジェクトのArn"
  rule      = aws_cloudwatch_event_rule.build_rule.name
  role_arn  = aws_iam_role.events.arn
  target_id = "build-event"

  # 環境変数を上書きする
  # ARGはクラスター名、サービス名、希望する容量をスペースを開けて記載
  input = <<DOC
  {
    "environmentVariablesOverride": [
      {
        "name": "SCRIPT",
        "value": "update-ecs.py"
      },
      {
        "name": "ARG",
        "value": "myapp-cluster myapp-go 1"
      }
    ]
  }
  DOC
}

resource "aws_cloudwatch_metric_alarm" "build_test" {
  alarm_name          = "build-alarm"
  comparison_operator = "GreaterThanOrEqualToThreshold"
  evaluation_periods  = "1"
  metric_name         = "FailedBuilds"
  namespace           = "AWS/CodeBuild"
  dimensions = {
    ProjectName = "event-codebuild-project"
  }
  period              = "300"
  statistic           = "Sum"
  threshold           = "1"
  treat_missing_data  = "notBreaching"
  alarm_description   = "自動処理失敗"
  alarm_actions       = [aws_sns_topic.test.arn]
}

CodeBuild

参考environmentVariablesOverride
https://qiita.com/yoshii0110/items/c180d30e9dfc25645df5

example.tf

resource "aws_codebuild_project" "main" {
  name          = "event-codebuild-project"
  description   = "event-codebuild-project"
  build_timeout = "30"
  service_role  = aws_iam_role.codebuild_role.arn

  artifacts {
    type     = "S3"
    location = var.build_bucket
  }

  environment {
    compute_type                = "BUILD_GENERAL1_SMALL"
    image                       = "aws/codebuild/standard:5.0"
    type                        = "LINUX_CONTAINER"
    image_pull_credentials_type = "CODEBUILD"

    # ダミー
    environment_variable {
      name  = "SCRIPT"
      value = "script"
    }

    environment_variable {
      name  = "ARG"
      value = "arg"
    }
  }

  logs_config {
    cloudwatch_logs {
      group_name = "/codebuild/test/log"
    }
  }

  source {
    type            = "CODECOMMIT"
    location        = "https://git-codecommit.ap-northeast-1.amazonaws.com/v1/repos/build-repo"
    git_clone_depth = 1
  }

  source_version = "develop"
}

上記のCodeBuildのsourceで指定したリポジトリ(build-repo)に
buildspec.ymlやscriptを配置する。

version: 0.2

phases:
  install:
    runtime-versions:
      python: 3.9
  build:
    commands:
      - echo Build started on `date`
      - python script/${SCRIPT} ${ARG}

リポジトリ(build-repo)のscriptディレクトリ以下にecs-update.pyを置く

ecs-update.py

import boto3
import sys

args = sys.argv
cluster_name, service_name, desired_count = args[1:]

client = boto3.client('ecs')
response = client.update_service(
    cluster = cluster_name,
    service = service_name,
    desiredCount = int(desired_count)
)
print(response)

ビルド時にSCRIPTとARGが上書きされるため、今回の場合は以下のようになる。

- python script/update-ecs.py myapp-cluster myapp-go 1

他

AWS Systems Managerを利用する

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up