More than 5 years have passed since last update.

株式会社シーエー・アドバンス

ec2にcollectdを入れて、nginxのrequest timeによるAuto Scalingをやってみた。

Last updated at 2017-05-18Posted at 2017-05-17

■経緯

Cloudwatch有効活用できてないなー
- 使えばAuto Scaleとかできるのに・・・
- せっかくなので、nginxのrequest timeが大きくなった時にAuto Scaleとかやってみたい。
NGINXのログをサーバに見に行くのもうしんどい・・・
- Fluentd -> Elasticsearchでログ飛ばしてるけど、t1とかt2のインスタンスだから、しばらく使うと悲鳴を上げて泣く
- Fluentdのログ集約用サーバもログ用のElasticsearchサーバももう管理したくない

■考えた解決手段

サーバのCPU使用量とか、メモリ使用量とか、稼働時間とかもついでにCloudwatchに飛ばしちゃえ
さらに、NGINXのエラーログもCloudwatchに飛ばしちゃえ
request time取れるようにして、Auto Scalingしてみる。

■実行

参考サイト見ると、Cloudwatchがカスタムメトリクス送れるようになったからって、
スクリプトとかcron使ってやってる人いるけど、それではやりたくない。
cron仕込んだとか、シェルスクリプトどこだっけとか覚えたくないし、このためにコード書きたくないしね。
なるべくサービス使ってやる方針でやってみた。

使ったツール

collectd
CloudWatch Logs(awslogs)

まずは、collectdのインストールから進めたが、下記のサイトの通り進めるとエラーになった・・・

collectdのCloudWatchプラグインを試してみた｜ Developers.IO

インストールの流れ

$ sudo yum -y install collectd
$ git clone https://github.com/awslabs/collectd-cloudwatch.git
$ cd collectd-cloudwatch/src
$ sudo ./setup.py
Traceback (most recent call last):
  File "./setup.py", line 26, in <module>
    from subprocess import check_output, CalledProcessError, Popen, PIPE
ImportError: cannot import name check_output

# むむっ・・・と思ってとりあえず、pythonのバージョンか何かかなと思って、確認。
$ python --version
Python 2.6.9　# 古いw

# うーん、、、ってここでつまずくのイヤだったので、とりあえず2.7にアップデートして試してみた
$ sudo yum upgrade
$ sudo yum -y install make automake gcc gcc-c++ kernel-devel git-core python27 python27-devel
$ which python27
$ python -V

# python2.7を見るように変える。
$ sudo rm /usr/bin/python
$ sudo ln -s /usr/bin/python27 /usr/bin/python
$ python -V

# yumもpython2.7を使うように変えておく。他のツールインストールするときにエラーになるらしい。
$ sudo cp /usr/bin/yum /usr/bin/_yum_old
$ sudo sed -i s/python/python26/g /usr/bin/yum

# これが成功したら大丈夫だってさ。
$ yum list

# 次はeazy_installやpipのインストール
$ sudo curl -o /tmp/ez_setup.py https://bootstrap.pypa.io/ez_setup.py
$ sudo /usr/bin/python /tmp/ez_setup.py
$ sudo /usr/bin/easy_install-2.7 pip

# collectdで送り先のリージョンとか自動取得するのに、awscli使うっぽかったので、aws-cliもアップデートしておく
$ aws --version
$ sudo /usr/local/bin/pip install --upgrade awscli

# 再度、サイトの流れで./setup.pyまで実行
$ sudo yum -y install collectd
$ git clone https://github.com/awslabs/collectd-cloudwatch.git
$ cd collectd-cloudwatch/src/

# またエラー・・・・
$ sudo ./setup.py
Installing dependencies ... OK
Installing python dependencies ... NOT OK
Installation cancelled due to an error.
Executed command: '/usr/bin/pip install --quiet --upgrade --force-reinstall requests'.
Error output: 'Traceback (most recent call last):
  File "/usr/bin/pip", line 5, in <module>
    from pkg_resources import load_entry_point
  File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 3019, in <module>
    
  File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 3003, in _call_aside
    
  File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 3032, in _initialize_master_working_set
    
  File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 657, in _build_master
    
  File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 670, in _build_from_requirements
    
  File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 849, in resolve
    
pkg_resources.DistributionNotFound: The 'pip==7.1.0' distribution was not found and is required by the application'.


# python2.6のpipではなく、2.7のが動くように交換。これやらなかったら上記エラーになってた
$ sudo cp /usr/bin/pip /usr/bin/pip26
$ sudo cp /usr/local/bin/pip /usr/bin/pip
$ sudo ./setup.py
Installing dependencies ... OK
Installing python dependencies ... OK
Downloading plugin ... OK
Extracting plugin ... OK
Moving to collectd plugins directory ... OK
Copying CloudWatch plugin include file ... OK

Choose AWS region for published metrics:
  1. Automatic [ap-northeast-1]
  2. Custom
Enter choice [1]: 1
Choose hostname for published metrics:
  1. EC2 instance id [i-XXXXXXXXXXXX]
  2. Custom
Enter choice [1]: 2
Enter hostname [ip-10-30-X-XXX]: 

Choose authentication method:
  1. IAM Role [XXXXXXXX-DefaultRole-XXXXXXX] # ポリシーをセットしたロールを選択。
  2. IAM User
Enter choice [1]: 1

Enter proxy server name:
  1. None
  2. Custom
Enter choice [1]: 1　# とりあえずデフォルトにした

Enter proxy server port:
  1. None
  2. Custom
Enter choice [1]: 1　# とりあえずデフォルトにした

Choose how to install CloudWatch plugin in collectd:
  1. Do not modify existing collectd configuration
  2. Add plugin to the existing configuration
  3. Use CloudWatch recommended configuration (4 metrics)
Enter choice [3]: 2
Plugin configuration written successfully.
Stopping collectd process ... NOT OK
Starting collectd process ... NOT OK　# またエラー・・・参考サイトだと動くはずなのだが・・・
Installation cancelled due to an error.
Executed command: '/usr/sbin/collectd'.
Error output: ''.

$ cat /opt/collectd-plugins/cloudwatch/config/blocked_metrics
# This file is automatically generated - do not modify this file.    
# Use this file to find metrics to be added to the whitelist file instead.
上記のコメントだけで、何も書かれてなかった

$ cat /opt/collectd-plugins/cloudwatch/config/whitelist.conf
ここは何も書かれてなかった

# エラーになってしまったら、syslogプラグイン有効にして起動したら `/var/log/messages` に
# エラーが出るそうなので見たら、下記が出てた。デフォルトだと「FQDNLookup」しちゃうらしい。。。
Looking up "ip-10-30-X-XXX" failed. You have set the "FQDNLookup" option, but I cannot resolve my hostname to a fully qualified domain name. Please fix the network configuration.

# ってことでとりあえずconfを書き換える。下記を追加。
$ sudo vim /etc/collectd.conf
FQDNLookup   false
AutoLoadPlugin true

# このファイルも下記を追加
$ cat /opt/collectd-plugins/cloudwatch/config/whitelist.conf
cpu-0-cpu-.*
interface-eth0-.*
memory--memory-.*

$ sudo service collectd start
Starting collectd:                                         [  OK  ] #　やったー！

んで、Cloudwatch見てみたら、できてたここまでで３時間くらい使った・・・orz

続きやります。

nginxプラグインでStatusをログとって、さらにrequest timeが取れるようにします。

config修正

$ sudo vim /etc/nginx/nginx.conf
# 下記を追加。
location /nginx-status {
    stub_status on;
}
$ sudo service nginx reload
Reloading nginx:                                           [  OK  ]

$ curl http://localhost/nginx-status
Active connections: 4 
server accepts handled requests
 406 406 2658 
Reading: 0 Writing: 1 Waiting: 3

collectdのnginxプラグインをインストールします。

$ sudo yum -y install collectd-nginx

# /etc/collectd.conf に以下の設定をする
LoadPlugin nginx のコメントアウトを外す

# さらに下記を追記する
<Plugin nginx>
    URL "http://localhost/nginx-status"
    CACert "/etc/pki/tls/certs/server.crt"　#　ここはhttpsを使ってる場合で良さそう。
</Plugin>

# 再起動
$ sudo service collectd restart
Stopping collectd:                                         [  OK  ]
Starting collectd:                                         [  OK  ]

# 下記のものがblocked_metricsに追加されているので確認。
$ cat /opt/collectd-plugins/cloudwatch/config/blocked_metrics
〜略〜
nginx--nginx_connections-active
nginx--connections-accepted
nginx--connections-handled
nginx--nginx_connections-reading
nginx--nginx_connections-writing
nginx--nginx_connections-waiting
nginx--nginx_requests-

# CloudWatch に送る対象として、上記をホワイトリストに追記する
$ sudo sh -c 'echo "nginx--.*" >> /opt/collectd-plugins/cloudwatch/config/whitelist.conf'

# 再起動
$ sudo service collectd restart

CloudWatchを見ると、、、、できてる

collectdにて、$request_time の値を CloudWatch に連携

# /etc/nginx/nginx.confに以下のように設定したとする
※末尾にした理由は、正規表現でマッチさせやすくするため
〜略〜
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                '$status $body_bytes_sent "$http_referer" '
                '"$http_user_agent" "$http_x_forwarded_for" '
                '"$request_time"';
〜略〜


# /etc/collectd.conf に以下の設定をする
LoadPlugin tail のコメントアウトを外す

# さらに下記を追記する
<Plugin "tail">
  <File "/var/log/nginx/access_443.log">
    Instance "nginx"
    <Match>
      Regex "\" \"([0-9.]+)\"$"
      DSType "GaugeAverage"
      Type "response_time"
      Instance "AvgRespTime"
    </Match>
    <Match>
      Regex "\" \"([0-9.]+)\"$"
      DSType "GaugeMin"
      Type "response_time"
      Instance "MinRespTime"
    </Match>
    <Match>
      Regex "\" \"([0-9.]+)\"$"
      DSType "GaugeMax"
      Type "response_time"
      Instance "MaxRespTime"
    </Match>
  </File>
</Plugin>

# 再起動
$ sudo service collectd restart
Stopping collectd:                                         [  OK  ]
Starting collectd:                                         [  OK  ]

# 下記のものがblocked_metricsに追加されているので確認。
$ cat /opt/collectd-plugins/cloudwatch/config/blocked_metrics
〜略〜
tail-nginx-response_time-AvgRespTime
tail-nginx-response_time-MinRespTime
tail-nginx-response_time-MaxRespTime

# CloudWatch に送る対象として、上記をホワイトリストに追記する
$ sudo sh -c 'echo "tail-nginx-response_time-.*" >> /opt/collectd-plugins/cloudwatch/config/whitelist.conf'

# 再起動
$ sudo service collectd restart

CloudWatchを見ると、、、、できてる

次に、awslogsインストールして、エラーログ飛ばす。

EC2インスタンス(AmazonLinux)にIAMロールを設定し、以下のようなIAMポリシーを設定します。
設定後は、nginxの起動しているインスタンスにポリシーをアタッチしてください。

作成したIAMポリシー

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "logs:*"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}

awslogsインストール

$ sudo yum install -y awslogs

awslogs.conf修正

/etc/awslogs/awslogs.confに下記を追加

[/var/log/nginx/error.log]
log_group_name = nginx-error_log
log_stream_name = {instance_id}
file = /var/log/nginx/error.log
datetime_format = %Y-%m-%d %H:%M:%S,%3N
initial_position = start_of_file
buffer_duration = 5000

awscli.conf修正

/etc/awslogs/awscli.conf

[plugins]
cwlogs = cwlogs
[default]
region = ap-northeast-1

awslogs起動

$ sudo service awslogs start

Cloudwatch見てみたら、できてたこれは簡単でした。

最後に、request timeによるアラート作成からのAuto Scalingをやる

これについては下記のページの「アラームの作成」から先を対応することでできた

レスポンスタイムでAuto Scaling | Skyarch Broadcasting

■参考サイト

■まとめ

使ってるEC2が古すぎて、かなり苦戦した・・・
これでうまーくAutoScalingされるといいけどなー
さて、問題なく動作するだろうか・・・

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up