7
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

エムスリーAdvent Calendar 2016

Day 17

Monitoring batch workers using CloudWatch Metrics

Posted at

CloudWatch Metrics

In our project, we use the standard CloudWatch Metrics to monitor performance and issues in production. For example, if CPU utilization on a server exceeds 80% for more than 5 minutes, an alarm triggers and sends us an email.

We've also built a few simple CloudWatch Dashboards for monitoring different parts of our system. It's a great way to get a quick overview.

Missing metrics: memory, swap, disk utilization

In addition to the standard metrics, we've added system level metrics like memory, swap and disk utilization. AWS provides a Perl script for that which can be run as a cron job:
http://docs.aws.amazon.com/ja_jp/AWSEC2/latest/UserGuide/mon-scripts.html

Screen Shot 2016-12-19 at 12.01.54.jpg

Custom metrics

We are using batch workers (ActiveJob via delayed_job) to handle long running requests, such as importing bulk data from files, or rendering large PDF files. These functions are used all the time by customers, so we need a health check for our worker processes.

To do this, we implemented our own simple custom AWS metric: "delayed-job::WorkerProcesses".

# This works on OSX and AWS Linux servers
pids = `pgrep -f delayed_job`.split
Rails.logger.info("Worker health check. #{pids.size} worker processes found (#{pids.join(',')})")

log_cloudwatch_metrics(
  namespace: 'delayed-job',
  aws_metrics: [AWSMetrics::Metric.new(name: 'WorkerProcesses', value: pids.size)]
)

Now we can see the metric in Cloudwatch like any other metric, and add alarms etc.
Screen Shot 2016-12-19 at 12.23.57.png

The check is run from a periodic batch job using delayed_job_recurring. That way, we know that the worker processes are running, and jobs are being processed successfully.

CloudWatch Metrics wrapper

Sending AWS Metrics is quite simple using the AWS SDK for Ruby, but the request format is a bit complex, with many fields to fill out. To make it simpler, we implemented a small wrapper with sensible defaults:

module AWSMetrics

  def log_cloudwatch_metrics(namespace:, aws_metrics:)
    cloudwatch_client.put_metric_data(
      namespace: "ProjectX/#{namespace}",
      metric_data: aws_metrics.map(&:to_h)
    )
  end

  def cloudwatch_client
    Aws::CloudWatch::Client.new(region: ENV['AWS_REGION'].presence || 'ap-northeast-1')
  end

  class Metric
    VALID_UNITS = [
      'Seconds', 'Microseconds', 'Milliseconds',
      'Bytes', 'Kilobytes', 'Megabytes', 'Gigabytes', 'Terabytes',
      'Bits', 'Kilobits', 'Megabits', 'Gigabits', 'Terabits',
      'Percent', 'Count',
      'Bytes/Second', 'Kilobytes/Second', 'Megabytes/Second', 'Gigabytes/Second', 'Terabytes/Second',
      'Bits/Second', 'Kilobits/Second', 'Megabits/Second', 'Gigabits/Second', 'Terabits/Second',
      'Count/Second',
      'None'
    ]

    attr_reader :name, :value, :aws_unit_string, :hostname, :timestamp

    def initialize(name:, value:, aws_unit_string: 'Count', hostname: Socket.gethostname, timestamp: Time.now)
      fail 'Invalid Unit' unless VALID_UNITS.include?(aws_unit_string)
      @name = name
      @value = value
      @aws_unit_string = aws_unit_string
      @hostname = hostname
      @timestamp = timestamp
    end

    def to_h
      {
        metric_name: @name,
        dimensions: [
          {
            name: 'hostname',
            value: @hostname
          }
        ],
        timestamp: @timestamp,
        value: @value,
        unit: @aws_unit_string
      }
    end
  end

end

Thanks for reading!

7
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
7
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?