数年ぶりに CloudWatch Agent をセットアップしようと思ったらセットアップの方法が増えていたのでやってみました。
環境情報
対象の EC2 インスタンスは、ネットワーク設定上 SSM と通信可能な状態にしてあります。
$ ec2-metadata --ami-id
ami-id: ami-0a390a03d7ac1e284
$ cat /etc/system-release
Amazon Linux release 2023.4.20240513 (Amazon Linux)
$ uname -a
Linux example.ap-northeast-1.compute.internal 6.1.79-99.164.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Feb 27 18:02:23 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
IAM ロールの設定
対象の EC2 インスタンスに必要な権限を付与します。
AWS 管理の以下ポリシーも使用できます。
- AmazonSSMManagedInstanceCore
- CloudWatchAgentAdminPolicy
- CloudWatchAgentServerPolicy
SSMでの設定用に、AmazonSSMManagedInstanceCore
と、CloudWatch Agent の動作に必要なCloudWatchAgentServerPolicy
をアタッチしていれば基本的に問題無いです。
CloudWatchAgentAdminPolicy
の方は、パラメータストアにパラメータを作成できるようにもなります。
それぞれの差分は以下の通りです。
ec2:DescribeVolumes
とssm:PutParameter
の有無
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "CWACloudWatchPermissions",
"Effect": "Allow",
"Action": [
"cloudwatch:PutMetricData",
"ec2:DescribeTags",
"logs:PutLogEvents",
"logs:PutRetentionPolicy",
"logs:DescribeLogStreams",
"logs:DescribeLogGroups",
"logs:CreateLogStream",
"logs:CreateLogGroup",
"xray:PutTraceSegments",
"xray:PutTelemetryRecords",
"xray:GetSamplingRules",
"xray:GetSamplingTargets",
"xray:GetSamplingStatisticSummaries"
],
"Resource": "*"
},
{
"Sid": "CWASSMPermissions",
"Effect": "Allow",
"Action": [
"ssm:GetParameter",
+ "ssm:PutParameter"
],
"Resource": "arn:aws:ssm:*:*:parameter/AmazonCloudWatch-*"
}
]
}
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "CWACloudWatchServerPermissions",
"Effect": "Allow",
"Action": [
"cloudwatch:PutMetricData",
+ "ec2:DescribeVolumes",
"ec2:DescribeTags",
"logs:PutLogEvents",
"logs:PutRetentionPolicy",
"logs:DescribeLogStreams",
"logs:DescribeLogGroups",
"logs:CreateLogStream",
"logs:CreateLogGroup",
"xray:PutTraceSegments",
"xray:PutTelemetryRecords",
"xray:GetSamplingRules",
"xray:GetSamplingTargets",
"xray:GetSamplingStatisticSummaries"
],
"Resource": "*"
},
{
"Sid": "CWASSMServerPermissions",
"Effect": "Allow",
"Action": [
"ssm:GetParameter"
],
"Resource": "arn:aws:ssm:*:*:parameter/AmazonCloudWatch-*"
}
]
}
設定ファイルの作成
CloudWatch Agent 設定ファイルを作成します。(作成済みの場合は対応不要)
設定ウィザードを利用する
本末転倒感ありますが、今回は別の EC2 インスタンスに CloudWatch Agent をインストールし、設定ウィザードを使用して CloudWatch Agent 設定ファイルを作成します。
設定ウィザードでパラメータストアへのアップロードも行えますが、権限が必要なため一時的にCloudWatchAgentAdminPolicy
もアタッチしておきます。
CloudWatch Agent インストール
CloudWatch Agent を手動でインストールします。
sudo yum install amazon-cloudwatch-agent
設定ウィザード実行
CloudWatch Agent 設定ウィザードを実行します。
$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
================================================================
= Welcome to the Amazon CloudWatch Agent Configuration Manager =
= =
= CloudWatch Agent allows you to collect metrics and logs from =
= your host and send them to CloudWatch. Additional CloudWatch =
= charges may apply. =
================================================================
On which OS are you planning to use the agent?
1. linux
2. windows
3. darwin
default choice: [1]:
Trying to fetch the default region based on ec2 metadata...
I! imds retry client will retry 1 timesAre you using EC2 or On-Premises hosts?
1. EC2
2. On-Premises
default choice: [1]:
Which user are you planning to run the agent?
1. cwagent
2. root
3. others
default choice: [1]:
2
Do you want to turn on StatsD daemon?
1. yes
2. no
default choice: [1]:
Which port do you want StatsD daemon to listen to?
default choice: [8125]
What is the collect interval for StatsD daemon?
1. 10s
2. 30s
3. 60s
default choice: [1]:
What is the aggregation interval for metrics collected by StatsD daemon?
1. Do not aggregate
2. 10s
3. 30s
4. 60s
default choice: [4]:
Do you want to monitor metrics from CollectD? WARNING: CollectD must be installed or the Agent will fail to start
1. yes
2. no
default choice: [1]:
Do you want to monitor any host metrics? e.g. CPU, memory, etc.
1. yes
2. no
default choice: [1]:
Do you want to monitor cpu metrics per core?
1. yes
2. no
default choice: [1]:
Do you want to add ec2 dimensions (ImageId, InstanceId, InstanceType, AutoScalingGroupName) into all of your metrics if the info is available?
1. yes
2. no
default choice: [1]:
Do you want to aggregate ec2 dimensions (InstanceId)?
1. yes
2. no
default choice: [1]:
Would you like to collect your metrics at high resolution (sub-minute resolution)? This enables sub-minute resolution for all metrics, but you can customize for specific metrics in the output json file.
1. 1s
2. 10s
3. 30s
4. 60s
default choice: [4]:
Which default metrics config do you want?
1. Basic
2. Standard
3. Advanced
4. None
default choice: [1]:
3
Current config as follows:
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"metrics": {
"aggregation_dimensions": [
[
"InstanceId"
]
],
"append_dimensions": {
"AutoScalingGroupName": "${aws:AutoScalingGroupName}",
"ImageId": "${aws:ImageId}",
"InstanceId": "${aws:InstanceId}",
"InstanceType": "${aws:InstanceType}"
},
"metrics_collected": {
"collectd": {
"metrics_aggregation_interval": 60
},
"cpu": {
"measurement": [
"cpu_usage_idle",
"cpu_usage_iowait",
"cpu_usage_user",
"cpu_usage_system"
],
"metrics_collection_interval": 60,
"resources": [
"*"
],
"totalcpu": false
},
"disk": {
"measurement": [
"used_percent",
"inodes_free"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"diskio": {
"measurement": [
"io_time",
"write_bytes",
"read_bytes",
"writes",
"reads"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"mem": {
"measurement": [
"mem_used_percent"
],
"metrics_collection_interval": 60
},
"netstat": {
"measurement": [
"tcp_established",
"tcp_time_wait"
],
"metrics_collection_interval": 60
},
"statsd": {
"metrics_aggregation_interval": 60,
"metrics_collection_interval": 10,
"service_address": ":8125"
},
"swap": {
"measurement": [
"swap_used_percent"
],
"metrics_collection_interval": 60
}
}
}
}
Are you satisfied with the above config? Note: it can be manually customized after the wizard completes to add additional items.
1. yes
2. no
default choice: [1]:
Do you have any existing CloudWatch Log Agent (http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AgentReference.html) configuration file to import for migration?
1. yes
2. no
default choice: [2]:
Do you want to monitor any log files?
1. yes
2. no
default choice: [1]:
2
Do you want the CloudWatch agent to also retrieve X-ray traces?
1. yes
2. no
default choice: [1]:
2
Existing config JSON identified and copied to: /opt/aws/amazon-cloudwatch-agent/etc/backup-configs
Saved config file to /opt/aws/amazon-cloudwatch-agent/bin/config.json successfully.
Current config as follows:
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"metrics": {
"aggregation_dimensions": [
[
"InstanceId"
]
],
"append_dimensions": {
"AutoScalingGroupName": "${aws:AutoScalingGroupName}",
"ImageId": "${aws:ImageId}",
"InstanceId": "${aws:InstanceId}",
"InstanceType": "${aws:InstanceType}"
},
"metrics_collected": {
"collectd": {
"metrics_aggregation_interval": 60
},
"cpu": {
"measurement": [
"cpu_usage_idle",
"cpu_usage_iowait",
"cpu_usage_user",
"cpu_usage_system"
],
"metrics_collection_interval": 60,
"resources": [
"*"
],
"totalcpu": false
},
"disk": {
"measurement": [
"used_percent",
"inodes_free"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"diskio": {
"measurement": [
"io_time",
"write_bytes",
"read_bytes",
"writes",
"reads"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"mem": {
"measurement": [
"mem_used_percent"
],
"metrics_collection_interval": 60
},
"netstat": {
"measurement": [
"tcp_established",
"tcp_time_wait"
],
"metrics_collection_interval": 60
},
"statsd": {
"metrics_aggregation_interval": 60,
"metrics_collection_interval": 10,
"service_address": ":8125"
},
"swap": {
"measurement": [
"swap_used_percent"
],
"metrics_collection_interval": 60
}
}
}
}
Please check the above content of the config.
The config file is also located at /opt/aws/amazon-cloudwatch-agent/bin/config.json.
Edit it manually if needed.
Do you want to store the config in the SSM parameter store?
1. yes
2. no
default choice: [1]:
What parameter store name do you want to use to store your config? (Use 'AmazonCloudWatch-' prefix if you use our managed AWS policy)
default choice: [AmazonCloudWatch-linux]
AmazonCloudWatch-AgentConfig-Linux
Trying to fetch the default region based on ec2 metadata...
I! imds retry client will retry 1 timesWhich region do you want to store the config in the parameter store?
default choice: [ap-northeast-1]
Which AWS credential should be used to send json config to parameter store?
1. ABCDEFGHIJKLMNOPQRST(From SDK)
2. Other
default choice: [1]:
Successfully put config to parameter store AmazonCloudWatch-AgentConfig-Linux.
Program exits now.
ウィザードで自動作成された設定ファイルが以下です。
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"metrics": {
"aggregation_dimensions": [
[
"InstanceId"
]
],
"append_dimensions": {
"AutoScalingGroupName": "${aws:AutoScalingGroupName}",
"ImageId": "${aws:ImageId}",
"InstanceId": "${aws:InstanceId}",
"InstanceType": "${aws:InstanceType}"
},
"metrics_collected": {
"collectd": {
"metrics_aggregation_interval": 60
},
"cpu": {
"measurement": [
"cpu_usage_idle",
"cpu_usage_iowait",
"cpu_usage_user",
"cpu_usage_system"
],
"metrics_collection_interval": 60,
"resources": [
"*"
],
"totalcpu": false
},
"disk": {
"measurement": [
"used_percent",
"inodes_free"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"diskio": {
"measurement": [
"io_time",
"write_bytes",
"read_bytes",
"writes",
"reads"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"mem": {
"measurement": [
"mem_used_percent"
],
"metrics_collection_interval": 60
},
"netstat": {
"measurement": [
"tcp_established",
"tcp_time_wait"
],
"metrics_collection_interval": 60
},
"statsd": {
"metrics_aggregation_interval": 60,
"metrics_collection_interval": 10,
"service_address": ":8125"
},
"swap": {
"measurement": [
"swap_used_percent"
],
"metrics_collection_interval": 60
}
}
}
}
CloudWatch Agent 起動確認
設定ファイルが出来たので、CloudWatch Agent を起動しようとしたら以下のようなエラーが発生しました。
$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c ssm:AmazonCloudWatch-AgentConfig-Linux
****** processing amazon-cloudwatch-agent ******
I! Trying to detect region from ec2 D! [EC2] Found active network interface I! imds retry client will retry 1 timesRegion: ap-northeast-1 credsConfig: map[] Successfully fetched the config and saved in /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/ssm_AmazonCloudWatch-AgentConfig-Linux.tmp
Start configuration validation...
2024/06/06 05:51:50 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/ssm_AmazonCloudWatch-AgentConfig-Linux.tmp ...
2024/06/06 05:51:50 I! Valid Json input schema.
2024/06/06 05:51:50 D! ec2tagger processor required because append_dimensions is set
2024/06/06 05:51:50 D! delta processor required because metrics with diskio or net are set
2024/06/06 05:51:50 D! ec2tagger processor required because append_dimensions is set
2024/06/06 05:51:50 Configuration validation first phase succeeded
I! Detecting run_as_user...
I! Trying to detect region from ec2
D! [EC2] Found active network interface
I! imds retry client will retry 1 times
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent -schematest -config /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
Configuration validation second phase failed
======== Error Log ========
2024-06-06T05:51:50Z E! [telegraf] Error running agent: Error loading config file /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml: error parsing socket_listener, open /usr/share/collectd/types.db: no such file or directory
因みに、この設定ファイルの状態で後述の SSM から設定をしようとした場合、以下のようなログが出力されます。
****** processing amazon-cloudwatch-agent ******
I! Trying to detect region from ec2 D! [EC2] Found active network interface I! imds retry client will retry 1 timesRegion: ap-northeast-1 credsConfig: map[] Successfully fetched the config and saved in /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/ssm_AmazonCloudWatch-AgentConfig-Linux.tmp
Start configuration validation...
I! Detecting run_as_user...
I! Trying to detect region from ec2
D! [EC2] Found active network interface
I! imds retry client will retry 1 times
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent -schematest -config /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
Configuration validation second phase failed
======== Error Log ========
2024-06-06T05:43:28Z E! [telegraf] Error running agent: Error loading config file /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml: error parsing socket_listener, open /usr/share/collectd/types.db: no such file or directory
2024/06/06 05:43:28 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/ssm_AmazonCloudWatch-AgentConfig-Linux.tmp ...
2024/06/06 05:43:28 I! Valid Json input schema.
2024/06/06 05:43:28 D! ec2tagger processor required because append_dimensions is set
2024/06/06 05:43:28 D! delta processor required because metrics with diskio or net are set
2024/06/06 05:43:28 D! ec2tagger processor required because append_dimensions is set
2024/06/06 05:43:28 Configuration validation first phase succeeded
failed to run commands: exit status 1
[小ネタ]Amazon Linux 2 で CloudWatch エージェントを起動しようとしてハマった話 | DevelopersIO
どうやら collectd がインストールされていないためエラーが発生していた様子、収集不要なため該当の指定を削除し、ついでにメモリのメトリクス指定も追加します。
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"metrics": {
"aggregation_dimensions": [
[
"InstanceId"
]
],
"append_dimensions": {
"AutoScalingGroupName": "${aws:AutoScalingGroupName}",
"ImageId": "${aws:ImageId}",
"InstanceId": "${aws:InstanceId}",
"InstanceType": "${aws:InstanceType}"
},
"metrics_collected": {
- "collectd": {
- "metrics_aggregation_interval": 60
- },
"cpu": {
"measurement": [
"cpu_usage_idle",
"cpu_usage_iowait",
"cpu_usage_user",
"cpu_usage_system"
],
"metrics_collection_interval": 60,
"resources": [
"*"
],
"totalcpu": false
},
"disk": {
"measurement": [
"used_percent",
"inodes_free"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"diskio": {
"measurement": [
"io_time",
"write_bytes",
"read_bytes",
"writes",
"reads"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"mem": {
"measurement": [
+ "mem_total",
+ "mem_available",
+ "mem_free",
+ "mem_cached",
+ "mem_used",
+ "mem_buffered",
+ "mem_available_percent",
"mem_used_percent"
],
"metrics_collection_interval": 60
},
"netstat": {
"measurement": [
"tcp_established",
"tcp_time_wait"
],
"metrics_collection_interval": 60
},
"statsd": {
"metrics_aggregation_interval": 60,
"metrics_collection_interval": 10,
"service_address": ":8125"
},
"swap": {
"measurement": [
"swap_used_percent"
],
"metrics_collection_interval": 60
}
}
}
}
再度、CloudWatch Agent を起動し、ステータスを確認します。
$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c ssm:AmazonCloudWatch-AgentConfig-Linux
****** processing amazon-cloudwatch-agent ******
I! Trying to detect region from ec2 D! [EC2] Found active network interface I! imds retry client will retry 1 timesRegion: ap-northeast-1 credsConfig: map[] Successfully fetched the config and saved in /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/ssm_AmazonCloudWatch-AgentConfig-Linux.tmp
Start configuration validation...
2024/06/06 05:54:44 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/ssm_AmazonCloudWatch-AgentConfig-Linux.tmp ...
2024/06/06 05:54:44 I! Valid Json input schema.
2024/06/06 05:54:44 D! ec2tagger processor required because append_dimensions is set
2024/06/06 05:54:44 D! delta processor required because metrics with diskio or net are set
2024/06/06 05:54:44 D! ec2tagger processor required because append_dimensions is set
2024/06/06 05:54:44 Configuration validation first phase succeeded
I! Detecting run_as_user...
I! Trying to detect region from ec2
D! [EC2] Found active network interface
I! imds retry client will retry 1 times
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent -schematest -config /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
Configuration validation second phase succeeded
Configuration validation succeeded
amazon-cloudwatch-agent has already been stopped
Created symlink /etc/systemd/system/multi-user.target.wants/amazon-cloudwatch-agent.service → /etc/systemd/system/amazon-cloudwatch-agent.service.
$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a status
{
"status": "running",
"starttime": "2024-06-06T05:54:44+00:00",
"configstatus": "configured",
"version": "1.300033.0"
}
作成した設定ファイルで CloudWatch Agent が正常に起動できていることが確認できました。
パラメータストアに設定を格納
いずれかの方法で、パラメータストアに設定を格納します。
なお、AWS 管理ポリシーを用いている場合は以下リソース指定により、パラメータの名前がAmazonCloudWatch-
で始まる必要があります。
"Resource": "arn:aws:ssm:*:*:parameter/AmazonCloudWatch-*"
設定ウィザードで自動作成されたパラメータを活用する場合
前項通りに実施している場合は、設定ウィザードでパラメータストアにパラメータAmazonCloudWatch-AgentConfig-Linux
が自動作成されているので、パラメータの値をエラーの修正追記した設定ファイルの内容で上書きします。
aws ssm put-parameter --name "AmazonCloudWatch-AgentConfig-Linux" --type "String" --value file:///opt/aws/amazon-cloudwatch-agent/bin/config.json --overwrite
手動でパラメータストアにパラメータを作成する場合
事前に設定ファイルを作成済みな場合や、設定ウィザードではパラメータの自動作成までは実施しなかった場合は、手動でパラメータストアにパラメータを作成します。
マネコン、CLI どちらで作成しても問題無いですが、本稿では CLI でのコマンドのみ載せます。
※上書きオプションを指定していないため、同名のパラメータが存在する場合はalready exists
エラーが発生します。
aws ssm put-parameter --name "<parameter name>" --type "String" --value file://<configuration_file_pathname>
Run Command で CloudWatch Agent インストール
SSM の Run Command で CloudWatch Agent をインストールします。
- Systems Manager コンソールを開く
- 画面左ナビゲーションペインから Run Command を選択する
- 「Run command」をクリックする
- コマンドドキュメントのリストから、
AWS-ConfigureAWSPackage
を選択する - コマンドのパラメータに以下を指定する
- Action:install
- Name:
AmazonCloudWatchAgent
- Version:※未指定の場合最新になる
- ターゲットで、任意の対象インスタンスを選択する
- その他のオプションやパラメータ項目も必要に応じて指定する
今回は S3 への書き込みのチェックを外しました。
- 画面最下部の「実行」をクリック
少し待って更新すると、コマンドの実行ステータスが成功し、CloudWatch Agent のインストールは完了します。
インスタンスを選択し、「出力の表示」を行うと、コマンド実行時の出力から CloudWatch Agent が正常にインストールされたことが確認できます。
Run Command で CloudWatch Agent 起動
SSM の Run Command で CloudWatch Agent の設定と起動をします。
- Systems Manager コンソールを開く
- 画面左ナビゲーションペインから Run Command を選択する
- 「Run command」をクリックする
- コマンドドキュメントのリストから、
AmazonCloudWatch-ManageAgent
を選択する - コマンドのパラメータに以下を指定する
- Action:configure
- Optional Configuration Source:ssm
- Optional Configuration Location:<前項で作成したパラメータの名前>
- Optional Restart:yes
- ターゲットで、任意の対象インスタンスを選択する
- その他のオプションやパラメータ項目も必要に応じて指定する
今回は S3 への書き込みのチェックを外しました。
- 画面最下部の「実行」をクリック
少し待って更新すると、コマンドの実行ステータスが成功し、CloudWatch Agent の設定と起動が完了します。
インスタンスを選択し、「出力の表示」を行うと、コマンド実行時の出力から CloudWatch Agent が正常に開始されたことが確認できます。
動作確認
指定したメモリ使用量などのカスタムメトリクスが想定通り取得できていました。
参考
[小ネタ]Amazon Linux 2 で CloudWatch エージェントを起動しようとしてハマった話 | DevelopersIO