Sidekiq の待機キュー長やレイテンシーベースで ECS サービスのオートスケーリングを実現する

Posted at 2025-10-24

概要

本記事では、sidekiq-cloudwatchmetricsという gem を使用して Sidekiq のメトリクスを Amazon CloudWatch に送信し、CloudWatch アラームを利用して Amazon ECS サービスのオートスケーリングを実現する方法について解説します。

前提条件

Rails アプリケーションで Sidekiq を使用している
AWS アカウントと適切な権限（CloudWatch、ECS、SNS へのアクセス権限）
Terraform の基本的な知識
ECS クラスターとサービスが既に設定されている

背景

Rails アプリケーションで Sidekiq を使用してバックグラウンドジョブを処理する際、キューの長さや処理時間などのメトリクスを監視し、負荷に応じてワーカープロセスの数を自動調整したいケースがあります。このような要件に対して、CloudWatch のメトリクスとアラーム機能を活用することで、効率的なオートスケーリングを実現できます。

アーキテクチャ概要

Sidekiq Worker → sidekiq-cloudwatchmetrics → CloudWatch Metrics
                                                      ↓
CloudWatch Alarms → ECS Auto Scaling → ECS Service (Task Count)

sidekiq-cloudwatchmetrics の設定

1. Gem のインストール

Gemfileに以下の行を追加します：

group :production do
  gem "sidekiq-cloudwatchmetrics"
end

2. Sidekiq の初期化設定

config/initializers/sidekiq.rbに以下の設定を追加します：

# frozen_string_literal: true

redis_url = ENV.fetch("REDIS_URL") { "redis://127.0.0.1:6379" }

Sidekiq.configure_client do |config|
  config.redis = { url: redis_url }
end

Sidekiq.configure_server do |config|
  config.redis = { url: redis_url }
  config.concurrency = 2
  config.queues = %w[default low]
end

# CloudWatch Metrics (本番環境のみ)
# この設定により、Sidekiqのメトリクスが自動的にCloudWatchに送信されます
if Rails.env.production?
  require "sidekiq/cloudwatchmetrics"
  namespace = ENV.fetch("SIDEKIQ_CLOUDWATCH_NAMESPACE", "Sidekiq")
  Sidekiq::CloudWatchMetrics.enable!(namespace: namespace)
end

3. 送信されるメトリクス

sidekiq-cloudwatchmetricsが自動的に送信するメトリクス：

QueueSize - キューの長さ
QueueLatency - キューの待機時間
Processed - 処理済みジョブ数
Failed - 失敗したジョブ数
Retry - リトライ中のジョブ数

4. 環境変数の設定

以下の環境変数を設定します：

# CloudWatchのネームスペース（オプション）
SIDEKIQ_CLOUDWATCH_NAMESPACE=MyApp/Sidekiq

# AWS認証情報（IAMロールまたは環境変数）
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=ap-northeast-1

必要な AWS 権限

ECS サービスに適用する IAM ロールには以下の権限が必要です：

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["cloudwatch:PutMetricData"],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "application-autoscaling:DescribeScalableTargets",
        "application-autoscaling:DescribeScalingPolicies",
        "application-autoscaling:RegisterScalableTarget",
        "application-autoscaling:PutScalingPolicy"
      ],
      "Resource": "*"
    }
  ]
}

Terraform でのインフラ設定例

1. CloudWatch アラームの定義

# cloudwatch_alarm.tf
resource "aws_cloudwatch_metric_alarm" "sidekiq_queue_size" {
  alarm_name          = "sidekiq-queue-size-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "QueueSize"
  namespace           = "Sidekiq"
  period              = "300"
  statistic           = "Average"
  threshold           = "10"
  alarm_description   = "This metric monitors sidekiq queue size"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    Queue = "default"
  }

  tags = {
    Environment = var.environment
    Service     = "sidekiq"
  }
}

resource "aws_cloudwatch_metric_alarm" "sidekiq_queue_latency" {
  alarm_name          = "sidekiq-queue-latency-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "QueueLatency"
  namespace           = "Sidekiq"
  period              = "300"
  statistic           = "Average"
  threshold           = "60"
  alarm_description   = "This metric monitors sidekiq queue latency"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    Queue = "default"
  }

  tags = {
    Environment = var.environment
    Service     = "sidekiq"
  }
}

2. ECS サービスのオートスケーリング設定

# ecs_autoscaling.tf
resource "aws_appautoscaling_target" "ecs_target" {
  max_capacity       = 10
  min_capacity       = 2
  resource_id        = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.main.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

resource "aws_appautoscaling_policy" "ecs_policy_scale_out" {
  name               = "sidekiq-scale-out"
  policy_type        = "StepScaling"
  resource_id        = aws_appautoscaling_target.ecs_target.resource_id
  scalable_dimension = aws_appautoscaling_target.ecs_target.scalable_dimension
  service_namespace  = aws_appautoscaling_target.ecs_target.service_namespace

  step_scaling_policy_configuration {
    adjustment_type         = "ChangeInCapacity"
    cooldown               = 300
    metric_aggregation_type = "Average"

    step_adjustment {
      metric_interval_lower_bound = 0
      metric_interval_upper_bound = 10
      scaling_adjustment         = 1
    }

    step_adjustment {
      metric_interval_lower_bound = 10
      metric_interval_upper_bound = 20
      scaling_adjustment         = 2
    }
  }
}

resource "aws_appautoscaling_policy" "ecs_policy_scale_in" {
  name               = "sidekiq-scale-in"
  policy_type        = "StepScaling"
  resource_id        = aws_appautoscaling_target.ecs_target.resource_id
  scalable_dimension = aws_appautoscaling_target.ecs_target.scalable_dimension
  service_namespace  = aws_appautoscaling_target.ecs_target.service_namespace

  step_scaling_policy_configuration {
    adjustment_type         = "ChangeInCapacity"
    cooldown               = 300
    metric_aggregation_type = "Average"

    step_adjustment {
      metric_interval_upper_bound = 0
      scaling_adjustment         = -1
    }
  }
}

3. アラームとスケーリングポリシーの連携

# cloudwatch_alarm_actions.tf
# キューの長さに基づくスケールアウト
resource "aws_cloudwatch_metric_alarm" "sidekiq_queue_size_scale_out" {
  alarm_name          = "sidekiq-queue-size-scale-out"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "QueueSize"
  namespace           = "Sidekiq"
  period              = "300"
  statistic           = "Average"
  threshold           = "10"
  alarm_description   = "This metric monitors sidekiq queue size for scale out"
  alarm_actions       = [
    aws_appautoscaling_policy.ecs_policy_scale_out.arn,
    aws_sns_topic.alerts.arn
  ]

  dimensions = {
    Queue = "default"
  }

  tags = {
    Environment = var.environment
    Service     = "sidekiq"
  }
}

# キューの長さに基づくスケールイン
resource "aws_cloudwatch_metric_alarm" "sidekiq_queue_size_scale_in" {
  alarm_name          = "sidekiq-queue-size-scale-in"
  comparison_operator = "LessThanThreshold"
  evaluation_periods  = "3"
  metric_name         = "QueueSize"
  namespace           = "Sidekiq"
  period              = "300"
  statistic           = "Average"
  threshold           = "2"
  alarm_description   = "This metric monitors sidekiq queue size for scale in"
  alarm_actions       = [
    aws_appautoscaling_policy.ecs_policy_scale_in.arn
  ]

  dimensions = {
    Queue = "default"
  }

  tags = {
    Environment = var.environment
    Service     = "sidekiq"
  }
}

# キューのレイテンシーに基づくスケールアウト
resource "aws_cloudwatch_metric_alarm" "sidekiq_queue_latency_scale_out" {
  alarm_name          = "sidekiq-queue-latency-scale-out"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "QueueLatency"
  namespace           = "Sidekiq"
  period              = "300"
  statistic           = "Average"
  threshold           = "60"
  alarm_description   = "This metric monitors sidekiq queue latency for scale out"
  alarm_actions       = [
    aws_appautoscaling_policy.ecs_policy_scale_out.arn,
    aws_sns_topic.alerts.arn
  ]

  dimensions = {
    Queue = "default"
  }

  tags = {
    Environment = var.environment
    Service     = "sidekiq"
  }
}

# キューのレイテンシーに基づくスケールイン
resource "aws_cloudwatch_metric_alarm" "sidekiq_queue_latency_scale_in" {
  alarm_name          = "sidekiq-queue-latency-scale-in"
  comparison_operator = "LessThanThreshold"
  evaluation_periods  = "3"
  metric_name         = "QueueLatency"
  namespace           = "Sidekiq"
  period              = "300"
  statistic           = "Average"
  threshold           = "10"
  alarm_description   = "This metric monitors sidekiq queue latency for scale in"
  alarm_actions       = [
    aws_appautoscaling_policy.ecs_policy_scale_in.arn
  ]

  dimensions = {
    Queue = "default"
  }

  tags = {
    Environment = var.environment
    Service     = "sidekiq"
  }
}

4. SNS トピックの設定

# sns.tf
resource "aws_sns_topic" "alerts" {
  name = "${var.environment}-sidekiq-alerts"
}

resource "aws_sns_topic_subscription" "email" {
  topic_arn = aws_sns_topic.alerts.arn
  protocol  = "email"
  endpoint  = var.alert_email
}

注意事項

1. コストの考慮

CloudWatch のメトリクス送信にはコストがかかります（約 $0.30/メトリクス/月）
送信頻度を適切に調整してください（デフォルトは 5 分間隔）
不要なメトリクスは送信しないようにしてください

2. スケーリングの調整

スケールアウト/スケールインの頻度を調整
クールダウン期間の設定（推奨: 300 秒以上）
しきい値の適切な設定（本番環境での負荷パターンを分析して決定）

3. 監視の継続

メトリクスの傾向を定期的に確認
アラートの設定を定期的に見直し
パフォーマンスの最適化

4. セキュリティ

IAM ロールの最小権限の原則を適用
本番環境では環境変数ではなく IAM ロールを使用
メトリクス送信のログを監査

まとめ

sidekiq-cloudwatchmetricsと CloudWatch アラームを組み合わせることで、Sidekiq のメトリクスに基づいた ECS サービスのオートスケーリングを実現できます。これにより、負荷に応じた自動的なリソース調整が可能になり、システムの安定性と効率性を向上させることができます。

参考 URL

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up