AWS ALB はオプションで、以下のようなアクセスログをS3に出力する。
h2 2020-03-08T23:50:58.701251Z app/xxxxxx-prod-alb/xxxxxxxxx 222.222.222.222:64202 - -1 -1 -1 302 - 1254 224 "GET https://example.com:443/action_store HTTP/2.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.18362" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 - "Root=1-xxxxxxx-xxxxxxxxxxxx" "at.m3.com" "arn:aws:acm:ap-northeast-1:xxxxxxxxxx:certificate/xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx" 300 2020-03-08T23:50:58.701000Z "redirect" "https://example.com:443/at/action_store" "-" "-" "-"
本格的にアクセスログ解析するならAthenaなどを使うべき。
だが、単に特定のアクセスログをとりあえず調べたい時もある。そんな時スペース区切りの形式は眼に悪い。そこで、見やすいJSONに変換する。
スペース区切りは、csvの一種であるので、csvモジュールでパースできる。
# !/usr/local/bin/python
# alb_access_log_to_json.py
import fileinput
import json
import csv
# https://docs.aws.amazon.com/ja_jp/elasticloadbalancing/latest/application/load-balancer-access-logs.html#access-log-entry-format
FIELD_KEYS = """
type
timestamp
elb
client:port
target:port
request_processing_time
target_processing_time
response_processing_time
elb_status_code
target_status_code
received_bytes
sent_bytes
request
user_agent
ssl_cipher
ssl_protocol
target_group_arn
trace_id
domain_name
chosen_cert_arn
matched_rule_priority
request_creation_time
actions_executed
redirect_url
error_reason
target:port_list
target_status_code_list
""".split()
reader = csv.reader(fileinput.input(), delimiter=' ', quotechar='"', escapechar='\\')
for fields in reader:
j = dict(zip(FIELD_KEYS, fields))
print(json.dumps(j))
実行例:
$ head -1 access_log.txt
h2 2020-03-08T23:50:58.701251Z app/xxxxxx-prod-alb/xxxxxxxxx 222.222.222.222:64202 - -1 -1 -1 302 - 1254 224 "GET https://example.com:443/action_store HTTP/2.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.18362" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 - "Root=1-xxxxxxx-xxxxxxxxxxxx" "at.m3.com" "arn:aws:acm:ap-northeast-1:xxxxxxxxxx:certificate/xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx" 300 2020-03-08T23:50:58.701000Z "redirect" "https://example.com:443/at/action_store" "-" "-" "-"
$ cat access_log.txt | python3 alb_access_log_to_json.py | jq .
{
"type": "h2",
"timestamp": "2020-03-08T23:50:58.701251Z",
"elb": "app/xxxxxx-prod-alb/xxxxxxxxx",
"client:port": "222.222.222.222:64202",
"target:port": "-",
"request_processing_time": "-1",
"target_processing_time": "-1",
"response_processing_time": "-1",
"elb_status_code": "302",
"target_status_code": "-",
"received_bytes": "1254",
"sent_bytes": "224",
"request": "GET https://example.com:443/action_store HTTP/2.0",
"user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.18362",
"ssl_cipher": "ECDHE-RSA-AES128-GCM-SHA256",
"ssl_protocol": "TLSv1.2",
"target_group_arn": "-",
"trace_id": "Root=1-xxxxxxx-xxxxxxxxxxxx",
"domain_name": "at.m3.com",
"chosen_cert_arn": "arn:aws:acm:ap-northeast-1:xxxxxxxxxx:certificate/xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx",
"matched_rule_priority": "300",
"request_creation_time": "2020-03-08T23:50:58.701000Z",
"actions_executed": "redirect",
"redirect_url": "https://example.com:443/at/action_store",
"error_reason": "-",
"target:port_list": "-",
"target_status_code_list": "-"
}