事前準備
S3をバックエンドにしたFirehoseリソースを作成しておく
FirehoseApplicationLogRole:
Type: AWS::IAM::Role
Properties:
RoleName: my-sweet-firehose-application-log-role
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Action: sts:AssumeRole
Effect: Allow
Principal:
Service: firehose.amazonaws.com
Policies:
- PolicyDocument:
Version: '2012-10-17'
Statement:
- Action:
- glue:GetTable
- glue:GetTableVersion
- glue:GetTableVersions
Effect: Allow
Resource: '*'
- Action:
- s3:AbortMultipartUpload
- s3:GetBucketLocation
- s3:GetObject
- s3:ListBucket
- s3:ListBucketMultipartUploads
- s3:PutObject
Effect: Allow
Resource:
- my-sweet-s3-bucket
- my-sweet-s3-bucket/*
PolicyName: my-sweet-firehose-application-log-policy
FirehoseApplicationLog:
Type: AWS::KinesisFirehose::DeliveryStream
Properties:
DeliveryStreamName: "my-sweet-deliver-stream" # max 64 chars
DeliveryStreamType: "DirectPut"
S3DestinationConfiguration:
BucketARN: "my-sweet-s3-bucket"
BufferingHints:
IntervalInSeconds: 60 # Default: 300
CompressionFormat: 'GZIP'
EncryptionConfiguration:
NoEncryptionConfig: NoEncryption # Encryption is done by S3
RoleARN: !Ref FirehoseApplicationLogRole
EKSのログをFluentdでFirehoseに送る
次の fluentd-firehose.yaml を k8s にデプロイする。CloudWatch Container Insights の fluentd config を参考に書いた。
Firehoseでログ収集したい deployment に fluentd_firehose_delivery_stream_name
という annotation で Firehose の Delivery Stream 名を指定すると転送される。
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-sweet-api
spec:
template:
metadata:
annotations:
fluentd_firehose_delivery_stream_name: "my-sweet-firehose-stream-name"
Athenaテーブルを作成する
CREATE DATABASE application_log
CREATE EXTERNAL TABLE application_log.my_sweet_api (log string, stream string, docker string, kubernetes string) PARTITIONED BY (date string) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION 's3://my-sweet-s3-bucket' TBLPROPERTIES ( 'projection.enabled' = 'true', 'projection.date.type' = 'date', 'projection.date.range' = '2020/01/01,NOW', 'projection.date.format' = 'yyyy/MM/dd', 'projection.date.interval' = '1', 'projection.date.interval.unit' = 'DAYS', 'storage.location.template' = 's3://my-sweet-s3-bucket/${!date}')
Athenaにクエリを投げる
log フィールドにコンテナログが入る。JSONなログであれば json_extract_scalar
を利用して取り出せる。
Partitioned table にしてあるのでパーティションキーである date
で絞り込むと安くて速い。
SELECT
json_extract_scalar(log, '$.severity') AS severity,
json_extract_scalar(log, '$.timestamp') AS timestamp,
json_extract_scalar(kubernetes, '$.container_name') AS container_name,
json_extract_scalar(kubernetes, '$.namespace_name') AS namespace_name
FROM application_log.my_sweet_api
WHERE
date = '2021/01/21'
AND from_iso8601_timestamp(json_extract_scalar(log, '$.timestamp')) BETWEEN timestamp '2021-01-21 18:42:00 Asia/Tokyo' AND timestamp '2021-01-21 18:44:00 Asia/Tokyo'
LIMIT 50;