elb
ALB
Athena

ALB の Default の Log を Athena に Import する方法

(1) まず ALB の Default の Log の仕様

http://docs.aws.amazon.com/ja_jp/elasticloadbalancing/latest/application/load-balancer-access-logs.html

(2) Athena にテーブルを作る

下記の CREATE TABLE 文にて、

  • YOUR_ATHENA_TABLE_NAME
  • YOUR_ALB_LOG_BUCKET

を適用する環境にあわせて変えて実行する

CREATE EXTERNAL TABLE IF NOT EXISTS YOUR_ATHENA_TABLE_NAME (
 `type` string,
 `timestamp` string,
 `elb` string,
 `client_port` string,
 `target_port` string,
 `request_processing_time` double,
 `target_processing_time` double,
 `response_processing_time` double,
 `elb_response_code` int,
 `backend_response_code` int,
 `received_bytes` bigint,
 `sent_bytes` bigint,
 `request` string,
 `user_agent` string,
 `ssl_cipher` string,
 `ssl_protocol` string,
 `target_group_arn` string,
 `trace_id` string
 )
 PARTITIONED BY (year int, month int, day int)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
 WITH SERDEPROPERTIES (
 'serialization.format' = '1',
 'input.regex' = '^([^ ]+) ([^ ]+) ([^ ]+) ([^ ]+) ([^ ]+) ([0-9-\.]+) ([0-9-\.]+) ([0-9-\.]+) ([0-9-]+) ([0-9-]+) ([0-9-]+) ([0-9-]+) ["](.*)["] ["](.*)["] ([^ ]+) ([^ ]+) ([^ ]+) (.*)$'
 ) LOCATION 's3://YOUR_ALB_LOG_BUCKET/elb-websocket/AWSLogs/XXXXX/elasticloadbalancing/YOUR_REGION/'
 TBLPROPERTIES ('has_encrypted_data'='false');

(3) データをロードする

ALTER TABLE YOUR_ATHENA_TABLE_NAME
ADD IF NOT EXISTS PARTITION (year=2017,month=7,day=31)
location 's3://YOUR_ALB_LOG_BUCKET/elb-websocket/AWSLogs/XXXXX/elasticloadbalancing/YOUR_REGION//2017/07/31/'

(4) データを確認する

↓とかで各 Status 毎の Count を見たり。

SELECT substr(timestamp, 1, 13) as time,
COUNT(IF(elb_response_code / 100 = 1, 1, NULL)) as count_1XX,
COUNT(IF(elb_response_code / 100 = 2, 1, NULL)) as count_2XX,
COUNT(IF(elb_response_code / 100 = 3, 1, NULL)) as count_3XX,
COUNT(IF(elb_response_code / 100 = 4, 1, NULL)) as count_4XX,
COUNT(IF(elb_response_code / 100 = 5, 1, NULL)) as count_5XX,
AVG(target_processing_time) as target_processing_time_average
FROM YOUR_ATHENA_TABLE_NAME 
WHERE year = 2017 AND month = 7 AND day = 31
GROUP BY substr(timestamp, 1, 13)
ORDER BY time