More than 5 years have passed since last update.

マイクロサービスで調査しやすいログをつくる

Last updated at 2019-02-27Posted at 2016-06-29

マイクロサービスの開発を1から一人で作った話。
サービス要件や、全体のシステム構成、開発フローのおおまかな流れは、以下の記事にまとめた。
一からマイクロサービスの開発フローを作った話

ここでは各論を書く。
まずは本番でのログについて。

ログの重要度付けと、それぞれの保存場所

まずログの重要度と、利用方法から保存場所を考える。
今回は以下のとおりとした。

状態監視をするためのもの

関連する各サーバのログを、
ログ集約サーバで閲覧できるようにすること。
そのときだけ見れればよいログ。

もしかしたらあとから確認することになるかもしれないもの

非常事態に確認するかもしれないログ。
重要度に応じて、保存する期間を決めておく。
今回はgzip化したものを、S3にアップロードする。

サービス管理者以外も利用するもの

たとえばデータの集計で利用するもの。
また、本サービスを利用するサービスがバグなどの調査を行う場合、必要となるもの。
今回はGoogle BigQueryに保存。

本番サーバで見るログの種類と役割

役割に応じて、保存期間や保存場所を変更するが、
すべてログを集約するサーバで確認できるようにする。
ログファイルは1時間ごとに新しい物が作成され、古いものはxxx.log-ymdhの形式で保存される。

アクセスログ
- 基本的にはどのサーバにアクセスがいっているか見れればいい
- 監視 + S3バックアップ (3ヶ月)
エラーログ
- どのサーバでどのようなエラーがでてるか知るため
- 監視 + S3バックアップ (3ヶ月)
リクエスト/レスポンスログ
- サービス利用者(API呼び出し元のサービス)での調査を可能にするため
- データの復元を可能にするため
- 監視 + S3バックアップ + BigQuery
ユーザーアクションログ(監視/バックアップ+BigQueryに保存)
- ユーザの行動を分析するため
- データの復元を可能にするため
- 監視 + S3バックアップ + BigQuery

フォルダ構成

var/log/fluentd
|--accounts (マイクロサービスプロジェクト名)
|  |--nginx (アクセスログ/エラーログ)
|  |  |--api_access.log (現在のものは日時を付けずに出力)
|  |  |--api_access.log-2016062917 (1時間前のものまではtxtで保存)
|  |  |--api_acesss.log-XXXXXXXXXX.gz (2時間以上前のものはgzip圧縮。サーバには過去3日分まで保存。)
|  |  |--api_error.log
|  |--response (レスポンスログ)
|  |  |--v2_accounts.log
|  |  |--v2_accounts_param.log
|  |  |--v2_accounts_param_items.log
|  |  |--v2_accounts_param_items_param.log
|  |--act (ユーザーアクションログ)
|  |  |--act.log 
|--items (マイクロサービスプロジェクト名)
…以下、マイクロサービスのプロジェクトごとに同じ構造がつづく

具体的なログの形式

リクエスト/レスポンスログ

概要

サービスにきたRequestデータと、Responseデータを合わせて保存する。

目的

プラットフォームとなるサービスのため、利用する側でエラーが起こった場合でも、調査方法を教えれば、利用者側で調査ができるようになる。
また、初期のアクションログが十分ではないときのため、調査すれば誰が、いつ、どのようなことをやったか最低限はわかる。

ファイル名の命名規則

endpointの[/]部分を[_]に変換
パラメータを受け取るところは、[param]に変換

例) RESTfulに作ったAPIの場合
/v1/accounts/1/items へのアクセス => v1_accounts_param_items.log

サンプルログ

基本的にはファイル名(endpoint)と時間で検索するので、カラムは分けず、ログを保存する。

20160629 18:19:20 {"request_uri":"/v2/accounts",
 "request_method":"POST",
 "request_body":"{\"nickname\":\"samplenickname\",\"app_code\":\"test\"}",
 "http_status_code":"200 OK",
 "response_body":"{\"id\":1881038,\"nickname\":\"samplenickname\",\"status\":\"valid\","session":"sample"}"
}

ユーザーアクションログ

概要

ユーザが特定の行動をした場合に、付随するデータを合わせて保存する

目的

ユーザの行動時に、その時点のデータを保存しておくことで、どのようなステータスの際に、どのような行動を起こすのか関連付けて考えることができる。
また、ログをもとにデータを元にもどすことができる。

fluentdの設定

ログ集約サーバと、
実際に処理を行うサーバでそれぞれ別の設定ファイルを利用する。

事前にfluentdのプラグインを入れておく

処理サーバ

基本的に、処理サーバにはログファイルを出力せず、集約サーバへログを送る。

# 各種nginxログを統合
<source>
  type tail_ex
  path /home/ec2-user/var/log/nginx/**.log
  pos_file /var/tmp/fluentd.pos
  format /^(?<message>.*)$/
  # 送られるタグはnginx.(ログフォルダへの階層)  
  # 例) nginx.home.ec2-user.var.log.nginx.api.accounts.production.access.log
  tag nginx.*
  refresh_interval 5
</source>

## ↑に設定したアクセスログをaggregateサーバに送信
<match nginx.**>
  @type forward
  buffer_type memory
  buffer_chunk_limit 16m
  buffer_queue_limit 128
  flush_interval 1s
  <server>
    host xxx.xxx.xxx # ログ集約サーバのホストを指定
    port xxxxxx
  </server>
  <secondary>
    @type file
    path /var/log/td-agent/nginx_forward-failed
  </secondary>
</match>

### ユーザーアクションログ用
# <project名>.actlog.<table名(logtype)とする>
<match **.actlog.**>
  @type forward
  buffer_type memory
  buffer_chunk_limit 16m
  buffer_queue_limit 128
  flush_interval 1s
  <server>
    host xxxxxxxxx
    port xxxxx
  </server>
  <secondary>
    @type file
    path /var/log/td-agent/actlog-forward-failed
  </secondary>
</match>

# request/responseのログ
<match td.response.**>
  @type forward
  buffer_type memory
  buffer_chunk_limit 16m
  buffer_queue_limit 128
  flush_interval 1s
  <server>
    host xxxxxxxxx
    port xxxxx
  </server>
  <secondary>
    @type file
    path /var/log/td-agent/response-forward-failed
  </secondary>
</match>

ログ集約サーバ

# cloneサーバからログを受け取る
<source>
  @type forward
  port 24224
</source>

# nginx関連のログをfluentdフォルダ内に保存
<match nginx.**>
  type forest
  subtype file_with_fix_path
  <template>
    path /home/ec2-user/var/log/fluentd/${tag_parts[7]}/${tag_parts[8]}/${tag_parts[6]}_${tag_parts[9]}.log
    time_slice_format %Y%m%d
    time_format %Y%m%dT%H%M%S%z
    flush_interval 1s
  </template>
</match>

# <dataset名>.actlog.<table名とする>
# ※ 必要な準備
# 1. BigQueryに project名で datasetを作成する
<match *.actlog.**>
  type forest
  subtype copy
  <template>
    # ログファイルに出力
    <store>
      type file_with_fix_path
      path /home/ec2-user/var/log/fluentd/${tag_parts[0]}/act/act.log
      time_slice_format %Y%m%d
      time_format %Y%m%dT%H%M%S%z
      flush_interval 1s
    </store>
    # bigqueryにも保存
    <store>
      type bigquery
      method insert
       # google developerで入手した認証ファイル
      auth_method json_key
      json_key /home/ec2-user/conf/td-agent/key/bigquery.json

      # プロジェクトを指定
      project bigqueryproject 
      # dataset名を決定
      dataset ${tag_parts[0]}_action_log
      # 指定されたファイルからテーブルを自動作成
      auto_create_table true
      table  ${tag_parts[2]}_%Y%m%d

      time_format %s
      time_field timestamp
      
      # 自動作成するテーブルのフォーマットを指定
      schema_path /home/ec2-user/conf/td-agent/json/${tag_parts[0]}_actlog_format.json
    </store>
  </template>
</match>

# request/responseのログ
# ※ 必要な準備
# 1. BigQueryに response テーブルを作成
# td.response.(accounts,items,*****).
<match td.response.**>
  type forest
  subtype copy
  <template>
    <store>
      type file_with_fix_path
      path /home/ec2-user/var/log/fluentd/${tag_parts[2]}/response/${tag_parts[3]}.log
      time_slice_format %Y%m%d
      time_format %Y%m%dT%H%M%S%z
      flush_interval 1s
    </store>
    <store>
      type bigquery
      method insert
      auth_method json_key
      json_key /home/ec2-user/conf/td-agent/key/bigquery.json
      project bigquery
      dataset ${tag_parts[2]}_response_log
      auto_create_table true
      table ${tag_parts[3]}_%Y%m%d

      time_format %s
      time_field timestamp
      schema_path /home/ec2-user/conf/td-agent/json/response_log_format.json
    </store>
  </template>
</match>

BigQueryテーブル自動生成用のファイル

fluentd-plugin-bigqueryは、schema_pathで作成したテーブルの構造を指定すると、自動でテーブルを作成してくれる。
たとえば、前述したリクエスト/レスポンスログ用であれば以下のようなファイルを指定する。

[
  {
    "name": "request_uri",
    "type": "STRING"
  },
  {
    "name": "request_method",
    "type": "STRING"
  },
  {
    "name": "request_body",
    "type": "STRING"
  },
  {
    "name": "http_status_code",
    "type": "INTEGER"
  },
  {
    "name": "response_body",
    "type": "STRING"
  },
  {
    "name": "timestamp",
    "type": "TIMESTAMP"
  }
]

logrotateの設定

# fluentdで来た各サーバのアクセス/エラー, response_log, action_logをlotate
/home/ec2-user/var/log/fluentd/*/*/*.log {
  hourly
  # 5日間で削除 : 24 * 5
  rotate 120
  # gzip圧縮を有効
  compress
  # ひと世代前のものは圧縮しない。2世代以上前のものはgzip圧縮
  delaycompress
  # ファイルがなくてもOK
  missingok
  # notifempty #空なら更新しない
  # ファイル末尾に日付
  dateext
  create 644 td-agent td-agent
  # postrotateを1回にまとめる
  sharedscripts
  postrotate
    /home/ec2-user/bin/upload_s3.sh $@ # ログファイルをS3にアップロードするファイル
    pid=/var/run/td-agent/td-agent.pid
    test -s $pid && kill -USR1 "$(cat $pid)"
  endscript
}

↑で利用しているupload_s3.sh

# !/bin/sh
BUCKET=<BUCKET名>
for LOGFILE in $@; do
    # 3世代前に残したログをアップロード
    # logrotateの設定で2世代前のものまでgzip圧縮されており、
    # 本スクリプトはpostrotateで読み込まれるので、3時間前のものがgzip圧縮されている
    LOGFILE=$LOGFILE-`date '+%Y%m%d%H' -d '3hours ago'`.gz
    BUCKETPATH=`echo "$LOGFILE" | sed s:/home/ec2-user/var/log/fluentd/::g`

    # mime-typeは未設定にしておくとgzipとしてダウンロードできる
    /usr/bin/s3cmd -M put $LOGFILE s3://${BUCKET}/${BUCKETPATH}
done

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up