分析の幅を広げるためにRailsAPIモードで動かしているアプリケーションのアクセスログを収集したい...
そこで、
- fluentdでログ収集
- 格納先はBigQuery
という仕様で分析基盤を構築することに。
前提
- ElasticBeanstalk (以下、EB)
- Puma with Ruby 2.5 running on 64bit Amazon Linux/2.8.6
- td-agent-3
- fluentd-bigquery-plugin 2.6.1
目次
- Railsサイドでログ整形
- BigQuery Credentials設定
- .ebextentions設定
- デプロイ
1. Railsサイドでログ整形
Railsから出力されるデフォルトのログがひどいため、logrageを使って、出力ログを整えておく。
このあたりを参考にさせていただきながら、出力されるlogをキレイなJSONに設定。
2. BigQuery Credentials設定
EBとの連携のために、APIキーを用意する必要がある。
GCP上の「APIとサービス」 → 「認証情報」 へ行き、 「認証情報を作成」をクリック
該当のページ:https://console.cloud.google.com/apis/credentials
作成後、JSONファイルをダウンロード。
このJSONファイルを読み込んでEC2上にファイルをコピーしてcredentialsとして使う。
そのため、JSONファイルはS3内のバケットに置いておいて、EBからアクセスできるようにしておく。
3. .ebextentions設定
さて本題はここから。
ElasticBeanstalkなので .ebextentionsにすべての設定・コマンドを書かなければならない。
やることは、
- 前のステップで行ったBigQueryとEBとの連携設定
- td-agent.confファイルの設定
- td-agent-install設定
- "/etc/init.d/td-agent" 初期設定変更
まず1から。
S3に置いたJSONファイルのバケット名とpathを下記の該当箇所に入力。
それ以外はコピペで問題ないはず。
Resources:
AWSEBAutoScalingGroup:
Metadata:
AWS::CloudFormation::Authentication:
S3Auth:
type: "s3"
buckets: ["バケット名"]
roleName:
"Fn::GetOptionSetting":
Namespace: "aws:autoscaling:launchconfiguration"
OptionName: "IamInstanceProfile"
DefaultValue: "aws-elasticbeanstalk-ec2-role"
files:
/etc/td-agent/big_query.json:
mode: "000644"
owner: root
group: root
authentication: "S3Auth"
source: "JSONファイルのpathを記述"
続いて、"/etc/td-agent/td-agent.conf" 設定ファイルの中身。
files:
"/etc/td-agent/td-agent.conf":
owner: root
group: root
content: |
<source>
@type tail
format json
path /var/app/current/log/"ログファイル名"
tag "タグ名"
pos_file /var/app/current/log/"pos_file名"
</source>
<filter "tag名">
@type grep
<exclude>
key user_agent
pattern ^ELB-HealthChecker/2.0$
</exclude>
</filter>
<match "tag名">
@type bigquery_insert
<buffer>
@type file
path /var/log/td-agent/buffer
timekey 3600
chunk_limit_size 256m
queue_limit_length 128
total_limit_size 10g
flush_interval 30s
flush_thread_interval 1.0
flush_thread_count 15
retry_max_times 9999999999999
retry_wait 1s
</buffer>
auth_method json_key
json_key /etc/td-agent/big_query.json
project "プロジェクト名"
dataset "データセット名"
auto_create_table true
<inject>
time_key timestamp
time_type string
time_format %Y-%m-%d %H:%M:%S
</inject>
schema [
{"name": "timestamp", "type": "TIMESTAMP"},
{"name": "host", "type": "STRING"},
{"name": "remote_ip", "type": "STRING"},
{"name": "user_id", "type": "INTEGER"},
{"name": "path", "type": "STRING"},
{"name": "method", "type": "STRING"},
{"name": "status", "type": "INTEGER"},
{"name": "format", "type": "STRING"},
{"name": "controller", "type": "INTEGER"},
{"name": "action", "type": "STRING"},
{"name": "duration", "type": "FLOAT"},
{"name": "view", "type": "FLOAT"},
{"name": "db", "type": "STRING"},
{"name": "user_agent", "type": "STRING"},
{"name": "referer", "type": "STRING"},
{"name": "os", "type": "STRING"},
{"name": "os_version", "type": "STRING"},
{"name": "browser", "type": "STRING"},
{"name": "browser_version", "type": "STRING"},
{"name": "exception_class", "type": "STRING"},
{"name": "exception_message", "type": "STRING"},
{"name": "exception_backtrace", "type": "STRING"}
]
</match>
各ディレクティブ毎にすることを簡単に付け加えておくと、
<source>:logrageを使って生成したログファイルを指定する。pos_file名も拡張子が.posになるが、ログファイルと同じ名前を指定
<filter>:ヘルスチェックをログから除きたいため、<exclude>を使って指定
<match>:matchしたログの処理。ココらへんはほぼテンプレ通り。
余談ですが、このディレクティブの書き方がversionによって違うし、2.xのものの情報が多かったために、かなり手間取った。公式ドキュメントも書き方が怪しいので、どなたか3.x系の正式な書き方をまとめてくれると泣いて喜びます...
話を戻します。td-agent-install設定は下記の通り。
commands:
01-command:
command: echo 'Defaults:root !requiretty' >> /etc/sudoers
02-command:
command: curl -s -L https://toolbelt.treasuredata.com/sh/install-amazon1-td-agent3.sh | sh
03-command:
command: td-agent-gem install fluent-plugin-bigquery
ここでtd-agent-3とfluent-plugin-bigqueryをinstall
最後に"/etc/init.d/td-agent"をちょっといじる。
ファイル自体はめちゃくちゃ長いが、いじったのは権限だけ。
初期設定では、owner/groupともにtd-agentになっているが、rootに変更。
[error]: #0 Permission denied @ rb_sysopen - /var/app/current/log/ログファイル名
上記のように、権限の問題で、ログファイルにアクセスできないエラーが出たため、付け足したファイル。
files:
"/etc/init.d/td-agent":
owner: root
group: root
content: |
#!/bin/sh
### BEGIN INIT INFO
# Provides: td-agent
# Required-Start: $network $local_fs
# Required-Stop: $network $local_fs
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: data collector for Treasure Data
# Description: td-agent is a data collector
### END INIT INFO
# pidfile: /var/run/td-agent/td-agent.pid
export PATH=/sbin:/usr/sbin:/bin:/usr/bin
TD_AGENT_NAME=td-agent
TD_AGENT_HOME=/opt/td-agent
TD_AGENT_DEFAULT=/etc/sysconfig/td-agent
TD_AGENT_USER=root
TD_AGENT_GROUP=root
TD_AGENT_RUBY=/opt/td-agent/embedded/bin/ruby
TD_AGENT_BIN_FILE=/usr/sbin/td-agent
TD_AGENT_LOG_FILE=/var/log/td-agent/td-agent.log
TD_AGENT_PID_FILE=/var/run/td-agent/td-agent.pid
TD_AGENT_LOCK_FILE=/var/lock/subsys/td-agent
TD_AGENT_OPTIONS="--use-v1-config"
# timeout can be overridden from /etc/sysconfig/td-agent
STOPTIMEOUT=120
# Read configuration variable file if it is present
if [ -f "${TD_AGENT_DEFAULT}" ]; then
. "${TD_AGENT_DEFAULT}"
fi
if [ -n "${name}" ]; then
# backward compatibility with omnibus-td-agent <= 2.2.0. will be deleted from future release.
echo "Warning: Declaring \$name in ${TD_AGENT_DEFAULT} has been deprecated. Use \$TD_AGENT_NAME instead." 1>&2
TD_AGENT_NAME="${name}"
fi
if [ -n "${prog}" ]; then
# backward compatibility with omnibus-td-agent <= 2.2.0. will be deleted from future release.
echo "Warning: Declaring \$prog in ${TD_AGENT_DEFAULT} for customizing \$PIDFILE has been deprecated. Use \$TD_AGENT_PID_FILE instead." 1>&2
if [ -z "${PIDFILE}" ]; then
TD_AGENT_PID_FILE="//var/run/td-agent/${prog}.pid"
fi
TD_AGENT_LOCK_FILE="//var/lock/subsys/${prog}"
TD_AGENT_PROG_NAME="${prog}"
else
unset TD_AGENT_PROG_NAME
fi
if [ -n "${process_bin}" ]; then
# backward compatibility with omnibus-td-agent <= 2.2.0. will be deleted from future release.
echo "Warning: Declaring \$process_bin in ${TD_AGENT_DEFAULT} has been deprecated. Use \$TD_AGENT_RUBY instead." 1>&2
TD_AGENT_RUBY="${process_bin}"
fi
if [ -n "${PIDFILE}" ]; then
echo "Warning: Declaring \$PIDFILE in ${TD_AGENT_DEFAULT} has been deprecated. Use \$TD_AGENT_PIDFILE instead." 1>&2
TD_AGENT_PID_FILE="${PIDFILE}"
fi
if [ -n "${DAEMON_ARGS}" ]; then
# TODO: Show warning on use of `DAEMON_ARGS`
# echo "Warning: Declaring \$DAEMON_ARGS in ${TD_AGENT_DEFAULT} has been deprecated. Use \$TD_AGENT_OPTIONS instead." 1>&2
START_STOP_DAEMON_ARGS=""
parse_daemon_args() {
while [ -n "$1" ]; do
case "$1" in
"--user="?* )
echo "Warning: Declaring --user in \$DAEMON_ARGS has been deprecated. Use \$TD_AGENT_USER instead." 1>&2
TD_AGENT_USER="${1#*=}"
;;
"--user" )
echo "Warning: Declaring --user in \$DAEMON_ARGS has been deprecated. Use \$TD_AGENT_USER instead." 1>&2
shift 1
TD_AGENT_USER="$1"
;;
* )
START_STOP_DAEMON_ARGS="${START_STOP_DAEMON_ARGS} $1"
;;
esac
shift 1
done
}
parse_daemon_args ${DAEMON_ARGS}
fi
if [ -n "${TD_AGENT_ARGS}" ]; then
ORIG_TD_AGENT_ARGS="${TD_AGENT_ARGS}"
TD_AGENT_ARGS=""
parse_td_agent_args() {
while [ -n "$1" ]; do
case "$1" in
"--group="?* )
echo "Warning: Declaring --group in \$TD_AGENT_ARGS has been deprecated. Use \$TD_AGENT_GROUP instead." 1>&2
TD_AGENT_GROUP="${1#*=}"
;;
"--group" )
echo "Warning: Declaring --group in \$TD_AGENT_ARGS has been deprecated. Use \$TD_AGENT_GROUP instead." 1>&2
shift 1
TD_AGENT_GROUP="$1"
;;
"--user="?* )
echo "Warning: Declaring --user in \$TD_AGENT_ARGS has been deprecated. Use \$TD_AGENT_USER instead." 1>&2
TD_AGENT_USER="${1#*=}"
;;
"--user" )
echo "Warning: Declaring --user in \$TD_AGENT_ARGS has been deprecated. Use \$TD_AGENT_USER instead." 1>&2
shift 1
TD_AGENT_USER="$1"
;;
* )
TD_AGENT_ARGS="${TD_AGENT_ARGS} $1"
;;
esac
shift 1
done
}
parse_td_agent_args ${ORIG_TD_AGENT_ARGS}
fi
# Arguments to run the daemon with
TD_AGENT_ARGS="${TD_AGENT_ARGS:-${TD_AGENT_BIN_FILE} --log ${TD_AGENT_LOG_FILE} ${TD_AGENT_OPTIONS}}"
START_STOP_DAEMON_ARGS="${START_STOP_DAEMON_ARGS}"
# Exit if the package is not installed
[ -x "${TD_AGENT_RUBY}" ] || exit 0
# Source function library.
. /etc/init.d/functions
# Define LSB log_* functions.
# Depend on lsb-base (>= 3.0-6) to ensure that this file is present.
. /lib/lsb/init-functions
# Check the user
if [ -n "${TD_AGENT_USER}" ]; then
if ! getent passwd | grep -q "^${TD_AGENT_USER}:"; then
echo "$0: user for running ${TD_AGENT_NAME} doesn't exist: ${TD_AGENT_USER}" >&2
exit 1
fi
mkdir -p "$(dirname "${TD_AGENT_PID_FILE}")"
chown -R "${TD_AGENT_USER}" "$(dirname "${TD_AGENT_PID_FILE}")"
START_STOP_DAEMON_ARGS="${START_STOP_DAEMON_ARGS} --user ${TD_AGENT_USER}"
fi
if [ -n "${TD_AGENT_GROUP}" ]; then
if ! getent group -s files | grep -q "^${TD_AGENT_GROUP}:"; then
echo "$0: group for running ${TD_AGENT_NAME} doesn't exist: ${TD_AGENT_GROUP}" >&2
exit 1
fi
TD_AGENT_ARGS="${TD_AGENT_ARGS} --group ${TD_AGENT_GROUP}"
fi
if [ -n "${TD_AGENT_PID_FILE}" ]; then
mkdir -p "$(dirname "${TD_AGENT_PID_FILE}")"
chown -R "${TD_AGENT_USER}" "$(dirname "${TD_AGENT_PID_FILE}")"
TD_AGENT_ARGS="${TD_AGENT_ARGS} --daemon ${TD_AGENT_PID_FILE}"
fi
# 2012/04/17 Kazuki Ohta <k@treasure-data.com>
# Use jemalloc to avoid memory fragmentation
if [ -f "${TD_AGENT_HOME}/embedded/lib/libjemalloc.so" ]; then
export LD_PRELOAD="${TD_AGENT_HOME}/embedded/lib/libjemalloc.so"
fi
kill_by_file() {
local sig="$1"
shift 1
local pid="$(cat "$@" 2>/dev/null || true)"
if [ -n "${pid}" ]; then
if /bin/kill "${sig}" "${pid}" 1>/dev/null 2>&1; then
return 0
else
return 2
fi
else
return 1
fi
}
#
# Function that starts the daemon/service
#
do_start() {
# Set Max number of file descriptors for the safety sake
# see http://docs.fluentd.org/en/articles/before-install
ulimit -n 65536 1>/dev/null 2>&1 || true
local RETVAL=0
daemon --pidfile="${TD_AGENT_PID_FILE}" ${START_STOP_DAEMON_ARGS} "${TD_AGENT_RUBY}" ${TD_AGENT_ARGS} || RETVAL="$?"
[ $RETVAL -eq 0 ] && touch "${TD_AGENT_LOCK_FILE}"
return $RETVAL
}
#
# Function that stops the daemon/service
#
do_stop() {
# Return
# 0 if daemon has been stopped
# 1 if daemon was already stopped
# 2 if daemon could not be stopped
# other if a failure occurred
if [ -e "${TD_AGENT_PID_FILE}" ]; then
# Use own process termination instead of killproc because killproc can't wait SIGTERM
if kill_by_file -TERM "${TD_AGENT_PID_FILE}"; then
local i
for i in $(seq "${STOPTIMEOUT}"); do
if kill_by_file -0 "${TD_AGENT_PID_FILE}"; then
sleep 1
else
break
fi
done
if kill_by_file -0 "${TD_AGENT_PID_FILE}"; then
echo -n "Timeout error occurred trying to stop ${TD_AGENT_NAME}..."
return 2
else
rm -f "${TD_AGENT_PID_FILE}"
rm -f "${TD_AGENT_LOCK_FILE}"
fi
else
return 1
fi
else
if killproc "${TD_AGENT_PROG_NAME:-${TD_AGENT_NAME}}"; then
rm -f "${TD_AGENT_PID_FILE}"
rm -f "${TD_AGENT_LOCK_FILE}"
else
return 2
fi
fi
}
#
# Function that sends a SIGHUP to the daemon/service
#
do_reload() {
kill_by_file -HUP "${TD_AGENT_PID_FILE}"
}
do_restart() {
if ! do_configtest; then
return 1
fi
local val=0
do_stop || val="$?"
case "${val}" in
0 | 1 )
if ! do_start; then
return 1
fi
;;
* ) # Failed to stop
return 1
;;
esac
}
do_configtest() {
eval "${TD_AGENT_ARGS} ${START_STOP_DAEMON_ARGS} --dry-run -q"
}
RETVAL=0
case "$1" in
"start" )
echo -n "Starting ${TD_AGENT_NAME}: "
do_start || RETVAL="$?"
case "$RETVAL" in
0 )
log_success_msg "${TD_AGENT_NAME}"
;;
* )
log_failure_msg "${TD_AGENT_NAME}"
exit 1
;;
esac
;;
"stop" )
echo -n "Stopping ${TD_AGENT_NAME}: "
do_stop || RETVAL="$?"
case "$RETVAL" in
0 )
log_success_msg "${TD_AGENT_NAME}"
;;
* )
log_failure_msg "${TD_AGENT_NAME}"
exit 1
;;
esac
;;
"reload" )
echo -n "Reloading ${TD_AGENT_NAME}: "
if ! do_configtest; then
log_failure_msg "${TD_AGENT_NAME}"
exit 1
fi
if do_reload; then
log_success_msg "${TD_AGENT_NAME}"
else
log_failure_msg "${TD_AGENT_NAME}"
exit 1
fi
;;
"restart" )
echo -n "Restarting ${TD_AGENT_NAME}: "
if do_restart; then
log_success_msg "${TD_AGENT_NAME}"
else
log_failure_msg "${TD_AGENT_NAME}"
exit 1
fi
;;
"status" )
if kill_by_file -0 "${TD_AGENT_PID_FILE}"; then
log_success_msg "${TD_AGENT_NAME} is running"
else
log_failure_msg "${TD_AGENT_NAME} is not running"
exit 1
fi
;;
"condrestart" )
if [ -f "${TD_AGENT_LOCK_FILE}" ]; then
echo -n "Restarting ${TD_AGENT_NAME}: "
if do_restart; then
log_success_msg "${TD_AGENT_NAME}"
else
log_failure_msg "${TD_AGENT_NAME}"
exit 1
fi
fi
;;
"configtest" )
if do_configtest; then
log_success_msg "${TD_AGENT_NAME}"
else
log_failure_msg "${TD_AGENT_NAME}"
exit 1
fi
;;
* )
echo "Usage: $0 {start|stop|reload|restart|condrestart|status|configtest}" >&2
exit 1
;;
esac
commands:
01-restart-td-agent:
command: sudo /etc/init.d/td-agent restart
ここまでできたら eb deployで上げたらおしまい。
td-agentの初期インストールは処理が重くて、AutoScalingが発動するケースが。
これは仕方ないので、最初だけインスタンスサイズ上げてもいいかもしれない。
設定中に出たエラーもあるので、気が向いたら追記していく予定。