More than 3 years have passed since last update.

SQSのよくある監視項目をterraformで設定する

Last updated at 2020-10-15Posted at 2020-10-15

IaCの恩恵をわかりやすくすぐに受けやすいのは、Cloudwatch Alarmなどの監視項目だと思っています。
特にSQSキューなどは基本的に監視項目が似通っているため、似たようなものを大量に登録、管理する事が多いでしょう。
今回は、よくあるアラートの設定項目をterraformでさっと量産する方法を覚書として記載しておきます。

よくある監視項目

毎回、SQSキューを作った後にとりあえず設定しておきたい監視項目は以下のようなものがあるかと思います。

デッドレターキューなどにメッセージが入った
キューされたメッセージがxxx件を超えた
未処理のメッセージが放置されたままxxx日が過ぎた

このへんにフォーカスした設定を書いてみます。

terrafromの設定

今回、通知に使っている AWS SNS Topic は別途定義されているものとします。
(もちろんEventBridgeなどSNS以外の方法でaction eventを受け取ってもいいでしょう)
module化を避け、1ファイルで完結させます。

sqs_alert.tf

locals {
  queues = [
    {
      queue_name : "sample-A-queue", // SQSキュー名
      alert_queue_count        = [200],          // 何個メッセージがたまったらアラートを出すか
      alert_oldest_message_day = [5, 7, 10, 13], // 一番古いメッセージがこの日数を検知したらアラート
    },
    {
      queue_name : "sample-B-queue-dead"
      alert_queue_count        = [1, 10, 20, 30, 100],
      alert_oldest_message_day = [5, 7, 10, 13],
    },
    {
      queue_name : "sample-C-queue.fifo",
      alert_queue_count        = [200],
      alert_oldest_message_day = [5, 7, 10, 13],
    },
  ]
}

// 使いやすいようにデータ変換処理
locals {
  queue_count_alert_source = flatten([for q in local.queues : [for c in q.alert_queue_count : {
    queue_name = q.queue_name
    count      = c
  }]])

  oldest_message_alert_source = flatten([for q in local.queues : [for day in q.alert_oldest_message_day : {
    queue_name = q.queue_name
    day        = day
    seconds    = day * 24 * 3600
  }]])
}

// キュー数監視アラート
resource aws_cloudwatch_metric_alarm queue_count_alarm {

  count = length(local.queue_count_alert_source)

  alarm_name                = "sqs_count_${local.queue_count_alert_source[count.index]["queue_name"]}_over_${local.queue_count_alert_source[count.index]["count"]}"
  alarm_description         = "message detected: SQS ${local.queue_count_alert_source[count.index]["queue_name"]} > ${local.queue_count_alert_source[count.index]["count"]} "
  comparison_operator       = "GreaterThanOrEqualToThreshold"
  evaluation_periods        = 1
  metric_name               = "ApproximateNumberOfMessagesVisible"
  namespace                 = "AWS/SQS"
  period                    = 300
  statistic                 = "Minimum"
  threshold                 = local.queue_count_alert_source[count.index]["count"]
  alarm_actions             = [aws_sns_topic.fatal.arn]
  ok_actions                = [aws_sns_topic.fatal.arn]
  insufficient_data_actions = []
  treat_missing_data        = "ignore" //欠落データを無視
  dimensions = {
    QueueName = local.queue_count_alert_source[count.index]["queue_name"]
  }
}

// 放置メッセージ監視アラート
resource aws_cloudwatch_metric_alarm oldest_message_days {

  count = length(local.oldest_message_alert_source)

  alarm_name                = "sqs_old_${local.oldest_message_alert_source[count.index]["queue_name"]}_day_${local.oldest_message_alert_source[count.index]["day"]}"
  alarm_description         = "oldest meessage alarm: SQS ${local.oldest_message_alert_source[count.index]["queue_name"]} > ${local.oldest_message_alert_source[count.index]["day"]} day "
  comparison_operator       = "GreaterThanOrEqualToThreshold"
  evaluation_periods        = 1
  metric_name               = "ApproximateAgeOfOldestMessage"
  namespace                 = "AWS/SQS"
  period                    = 300
  statistic                 = "Minimum"
  threshold                 = local.oldest_message_alert_source[count.index]["seconds"]
  alarm_actions             = [aws_sns_topic.fatal.arn]
  ok_actions                = [aws_sns_topic.fatal.arn]
  insufficient_data_actions = []
  treat_missing_data        = "ignore" //欠落データを無視
  dimensions = {
    QueueName = local.oldest_message_alert_source[count.index]["queue_name"]
  }
}

aws_cloudwatch_metric_alarm.queue_count_alarm では「キュー数監視」
aws_cloudwatch_metric_alarm oldest_message_daysでは「放置されたメッセージの日数」を監視しています。
これで最低限のものは設定できるかと思います。

何度も同じlocal変数へのアクセスや[count.index]が出てきてしまうのが少し悩みどころですね。
このへんはCDKやら使っていると、「もうすこし素直にデータ変換やループ定義したいなあ」と思ってしまいます。
tfでももうすこしスマートになら無いかと思うので、綺麗なやり方募集です。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up