More than 5 years have passed since last update.

Logstashを使ってApacheログを読み込んでみた（２）

Last updated at 2018-08-23Posted at 2018-08-23

はじめに

Logstashを使ってApacheログを読み込んでみた（１）の続き
自分の使うアクセスログを読み込ませてみる
Logstashに直接読み込ませてテストしてから、Filebeatで監視することにする

まとめ

Logstashにログを直接読み込ませて、フィルタの調整を行った
Filebeatを使って、ログを同期したときに、Elasticsearchに取り込めるようにした

バックグラウンドで起動

ElasticsearchとKibanaはバックグラウンドで起動しておく

$ brew services start elasticsearch
==> Successfully started `elasticsearch` (label: homebrew.mxcl.elasticsearch)

$ brew services start kibana
==> Successfully started `kibana` (label: homebrew.mxcl.kibana)

`Logstash`に直接読み込ませてテストした

以下の内容でimport-apache.confを作成した（first-pipeline.confをコピーして編集）
1. input : 標準入力
- filter : Apacheのコンバイン形式、GeoIP、+α
- output : 標準出力 + Elasticsearch

$ cp first-pipeline.conf import-apache.conf
$ emacs import-apache.conf
## （編集内容は下記参照）
$ logstash -f import-apache.conf < logstash-tutorial.log

import-apache.confの内容

# The # character at the beginning of a line indicates a comment.
# Use comments to describe your configuration.
input {
#    beats {
#        port => "5044"
#    }
      stdin { }
}
# The filter part of this file is commented out to indicate that
# it is optional.
filter {
    grok {
        match => { "message" => "%{COMBINEDAPACHELOG}" }
        break_on_match => false
        tag_on_failure => ["_message_parse_failure"]
    }
    geoip {
        source => "clientip"
    }
    date {
        match => [ "timestamp", "dd/MMM/yyyy:HH:mm::ss Z"]
        locale => "en"
        target => "@timestamp"
    }
    useragent {
        source => "agent"
        target => "useragent"
    }
}
output {
    stdout { codec => rubydebug }
    elasticsearch {
        hosts => [ "localhost:9200" ]
    }
}

フィルタについて

Logstash - Filter plugins

`grok`フィルタ

Grok filter plugin
``break_on_match => false`
tag_on_failure : matchに失敗したときにタグに書き出す内容。デフォルトは_grokparsefailure

`geiop`フィルタ

GeoIP filter plugin
tag_on_failure : デフォルト値　_geoip_lookup_failure

GeoIPフィルタ

    geoip {
        source => "clientip"
    }

GeoIPフィルタの結果

...（省略）...
        "clientip" => "86.1.76.62",
        "geoip" => {
              "latitude" => 51.4434,
           "region_name" => "Lambeth",
              "timezone" => "Europe/London",
          "country_name" => "United Kingdom",
                    "ip" => "86.1.76.62",
        "continent_code" => "EU",
         "country_code3" => "GB",
             "city_name" => "Balham",
         "country_code2" => "GB",
              "location" => {
            "lon" => -0.1468,
            "lat" => 51.4434
        },
...（省略）...

`date`フィルタ

Date filter plugin
tag_on_failure : デフォルト値　_dateparsefailure

dateフィルタ

    date {
        match => [ "timestamp", "dd/MMM/yyyy:HH:mm::ss Z"]
        locale => "en"
        target => "@timestamp"
    }

dateフィルタ適用前


"timestamp" => "04/Jan/2015:05:30:37 +0000",  ## dd/MMM/yyyy:HH:mm:ss Z のフォーマット（Mが3つ）
"@timestamp" => 2018-08-23T02:15:58.786Z,     ## データを読み込んだ時刻になっている

dateフィルタ適用後

"timestamp" => "04/Jan/2015:05:30:37 +0000",
"@timestamp" => 2015-01-04T05:30:37.000Z,    ## ログが書き込まれた時刻に書き換わっている

dateフィルタに失敗した例

...（省略）...
    "tags" => [
        [0] "_dateparsefailure"
    ],
...（省略）...

`useragent`フィルタ

Useragent filter plugin
source : UA情報を含んでいる文字列を指定する。必須。
target : 解析したUA情報を出力する先の変数名

Useragentフィルタ

    useragent {
        source => "agent"
        target => "useragent"
    }

Useragentフィルタの出力

## source（これを読み込んで）
    "agent" => "\"Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20140205 Firefox/24.0 Iceweasel/24.3.0\"",
## target（こういう情報が出てくる）
    "useragent" => {
          "major" => "24",
           "name" => "Iceweasel",
          "minor" => "3",
             "os" => "Linux",
          "patch" => "0",
         "device" => "Other",
          "build" => "",
        "os_name" => "Linux"
    },

本番ファイルとの比較

サンプルログ（100行）と、読み込ませたいログ（約41万行）の実行時間を計ってみた
行数が多いので、標準出力には出さず、Elasticsearchだけに突っ込むことにした
4分かからないくらいで読み込めてしまった。はやい！！すごい！！

$ wc -l logstash-tutorial.log
100 logstash-tutorial.log
$ time logstash -f import-apache.conf < logstash-tutorial.log
38.46 real       102.74 user         3.12 sys

$ wc -l weblogs/ccprod01/access_log
408488 weblogs/ccprod01/access_log
## 行数が多いので、標準出力はなしにして実行
$ time logstash -f import-apache.conf < $MY_APACHE_LOG
133.23 real       216.88 user         4.73 sys

読み込ませたデータの削除

サンプルをいろいろ読み込ませてるので、Elasticsearchのインデックスが汚れているので一度クリアしたくなってきた
Delete index
Kibana → Dev Tools の Consoleを使って、上記コマンドを実行する
インデックス名はlogstash-*に変更した

DELETE logstash-*

ログの再読込

行数の確認

$ wc -l weblogs/prod0*/*access_log

  408488 weblogs/prod01/access_log
  414190 weblogs/prod02/access_log
  409775 weblogs/prod03/access_log
  436533 weblogs/prod04/access_log

 1062117 weblogs/prod01/ssl_access_log
 1011087 weblogs/prod02/ssl_access_log
  856647 weblogs/prod03/ssl_access_log
  884293 weblogs/prod04/ssl_access_log

読み込み時間

$ time logstash -f import-apache.conf < weblogs/prod01/access_log
113.15 real       225.74 user         4.73 sys

$ time logstash -f import-apache.conf < weblogs/prod02/access_log
126.99 real       219.36 user         4.59 sys

$ time logstash -f import-apache.conf < weblogs/prod03/access_log
128.44 real       232.02 user         4.87 sys

$ time logstash -f import-apache.conf < weblogs/prod04/access_log
129.59 real       235.75 user         5.04 sys

$ time logstash -f import-apache.conf < weblogs/prod01/ssl_access_log
240.79 real       385.14 user         7.03 sys

$ time logstash -f import-apache.conf < weblogs/prod02/ssl_access_log
228.33 real       362.32 user         6.48 sys

$ time logstash -f import-apache.conf < weblogs/prod03/ssl_access_log
207.41 real       333.99 user         6.19 sys

$ time logstash -f import-apache.conf < weblogs/prod04/ssl_access_log
200.58 real       344.14 user         6.13 sys

`Filebeat`と組み合わせる

/usr/local/etc/filebeat/filebeat.ymlの入力ファイルのpathを修正する
import-apache.confのinputをbeatにする

filebeat.yml

- type: log

  # Change to true to enable this prospector configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    # - /Users/shotakaha/repos/kuma/es-tutorial/logstash-tutorial.log
    - /Users/shotakaha/repos/kuma/es-tutorial/prod01/access_log
    - /Users/shotakaha/repos/kuma/es-tutorial/prod02/access_log
    - /Users/shotakaha/repos/kuma/es-tutorial/prod03/access_log
    - /Users/shotakaha/repos/kuma/es-tutorial/prod04/access_log
    - /Users/shotakaha/repos/kuma/es-tutorial/prod01/ssl_access_log
    - /Users/shotakaha/repos/kuma/es-tutorial/prod02/ssl_access_log
    - /Users/shotakaha/repos/kuma/es-tutorial/prod03/ssl_access_log
    - /Users/shotakaha/repos/kuma/es-tutorial/prod04/ssl_access_log

import-apache.confを修正

# The # character at the beginning of a line indicates a comment.
# Use comments to describe your configuration.
input {
    beats {
        port => "5044"
    }
}
# The filter part of this file is commented out to indicate that
# it is optional.
filter {
    grok {
        match => { "message" => "%{COMBINEDAPACHELOG}" }
    }
    date {
        match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
        locale => "en"
        target => "@timestamp"
    }
    geoip {
        source => "clientip"
    }
    useragent {
        source => "agent"
        target => "useragent"
    }
}
output {
    #stdout { codec => rubydebug }
    elasticsearch {
        hosts => [ "localhost:9200" ]
    }
}

$ logstash -f import-apache.conf
$ filebeat -c filebeat.yml -d "publish"
$ tail -f /usr/local/var/log/filebeat/filebeat
## たまにtailして終わるのを確認する

最初の読み込みが終わったことを確認した
Filebeatのログを表示（tail -f filebeat）しながら、Apacheログを更新（rsyncで同期）した
Filebeatのログが追加されるのを確認した
Filebeat、Logstashをバックグラウンド（デーモン？）で起動する方法を調べる

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up