Help us understand the problem. What is going on with this article?

S3にあるELBのログをAmazon Athenaでクエリする

毎回調べているような気がするので備忘録として。
個人的にはログ系は1つのバケットに全部いれてしまうのが良いと思っているので、ログ専用のバケットを切ります。
同じような設定を色んな所にしなければならなかったりするので。特に権限周り。
ここでは
s3:////AWSLogs//elasticloadbalancing/ap-northeast-1/2018というディレクトリ構成にします
hive形式で作ればadd partitionする必要がないらしいのですが、ELBが作るデフォルトのディレクトリ構成に従います。

DDLを用意する

CREATE EXTERNAL TABLE IF NOT EXISTS <dbname>.<elbname> (
  request_timestamp string,
  elb_name string,
  request_ip string,
  request_port int,
  backend_ip string,
  backend_port int,
  request_processing_time double,
  backend_processing_time double,
  client_response_time double,
  elb_response_code string,
  backend_response_code string,
  received_bytes bigint,
  sent_bytes bigint,
  request_verb string,
  url string,
  protocol string,
  user_agent string,
  ssl_cipher string,
  ssl_protocol string )
PARTITIONED BY (year STRING, month STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
         'serialization.format' = '1','input.regex' = '([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*) ([-0-9]*) \\\"([^ ]*) ([^ ]*) (- |[^ ]*)\\\" (\"[^\"]*\") ([A-Z0-9-]+) ([A-Za-z0-9.-]*)$' )
LOCATION 's3://<BUCKET_NAME>/<ELB_NAME>/AWSLogs/<ACCOUNT_ID>/elasticloadbalancing/ap-northeast-1/';

パーティションは年、月で切れればいいので、year,monthを設定しました。

パーティションを切る

ALTER TABLE <dbname>.<elbname> ADD PARTITION (year='2018',month='01') location 's3://<BUCKET_NAME>/<ELB_NAME>/AWSLogs/<ACCOUNT_ID>/elasticloadbalancing/ap-northeast-1/2018/01/';
ALTER TABLE <dbname>.<elbname> ADD PARTITION (year='2018',month='02') location 's3://<BUCKET_NAME>/<ELB_NAME>/AWSLogs/<ACCOUNT_ID>/elasticloadbalancing/ap-northeast-1/2018/02/';
...
略

クエリしてみる

レスポンスタイムのパーセンタイルをクエリしてみます。
これはこれで別記事にするべきかもしれませんがめんどくさいので纏めます。

select 
    approx_percentile(client_response_time, 0.99) as percentile_99,
    approx_percentile(client_response_time, 0.95) as percentile_95,
    approx_percentile(client_response_time, 0.90) as percentile_90,
    approx_percentile(client_response_time, 0.5) as percentile_50,
    approx_percentile(client_response_time, 0.3) as percentile_30
from
    table_name
where
    year='2018' and month='12'

これでできるようになりました。S3にログのファイルがあるので好きなだけテーブルを作り変えれますね。

Why do not you register as a user and use Qiita more conveniently?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away