概要
- OrganizationでCloudTrailを有効化すると、S3バケットのパスに組織IDが入る
- Partitionを切りたいがOrganizationのいろいろなユーザにALTER TABLEでつどPatitionをロードを強いるのは辛い
- なのでPartition Projectionをつかっていい感じにPartitionをきりたい
参考:https://dev.classmethod.jp/articles/cloudtrail-athena-partition-projection-table/
ちなみにほぼこれの内容を踏襲した記事です
S3バケットのパス
単一のAWSアカウントでCloudtrail有効化したときと違い組織IDが入るようになります
s3://{バケット名}/AWSLogs/{組織ID}/{アカウントID}/CloudTrail/
テーブル作成
CREATE EXTERNAL TABLE cloudtrail_logs (
eventVersion STRING,
userIdentity STRUCT<
type: STRING,
principalId: STRING,
arn: STRING,
accountId: STRING,
invokedBy: STRING,
accessKeyId: STRING,
userName: STRING,
sessionContext: STRUCT<
attributes: STRUCT<
mfaAuthenticated: STRING,
creationDate: STRING>,
sessionIssuer: STRUCT<
type: STRING,
principalId: STRING,
arn: STRING,
accountId: STRING,
userName: STRING>>>,
eventTime STRING,
eventSource STRING,
eventName STRING,
awsRegion STRING,
sourceIpAddress STRING,
userAgent STRING,
errorCode STRING,
errorMessage STRING,
requestParameters STRING,
responseElements STRING,
additionalEventData STRING,
requestId STRING,
eventId STRING,
resources ARRAY<STRUCT<
arn: STRING,
accountId: STRING,
type: STRING>>,
eventType STRING,
apiVersion STRING,
readOnly STRING,
recipientAccountId STRING,
serviceEventDetails STRING,
sharedEventID STRING,
vpcEndpointId STRING
)
COMMENT 'CloudTrail table for ${BucketName} bucket'
PARTITIONED BY (
accountid string,
region string,
date string
)
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'
STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://{バケット名}/AWSLogs/'
TBLPROPERTIES (
'projection.enabled' = 'true',
'projection.accountid.type' = 'enum',
'projection.accountid.values' = '111111111111,222222222222,333333333333,4444444444444,555555555555,666666666666,777777777777,8888888888888,999999999999',
'projection.date.type' = 'date',
'projection.date.range' = 'NOW-1YEARS,NOW',
'projection.date.format' = 'yyyy/MM/dd',
'projection.date.interval' = '1',
'projection.date.interval.unit' = 'DAYS',
'projection.region.type' = 'enum',
'projection.region.values'='us-east-1,us-east-2,us-west-1,us-west-2,af-south-1,ap-east-1,ap-south-1,ap-northeast-2,ap-southeast-1,ap-southeast-2,ap-northeast-1,ca-central-1,eu-central-1,eu-west-1,eu-west-2,eu-south-1,eu-west-3,eu-north-1,me-south-1,sa-east-1',
'storage.location.template' = 's3://{バケット名}/AWSLogs/{組織ID}/${accountid}/CloudTrail/${region}/${date}',
'compressionType'='gzip',
'typeOfData'='file',
'classification'='cloudtrail'
);
accountidのprojection つらすぎ たすけて
TBLPROPERTIES (
'projection.enabled' = 'true',
'projection.accountid.type' = 'enum',
'projection.accountid.values' = '111111111111,222222222222,333333333333,444444444444,555555555555,666666666666,777777777777,888888888888,999999999999',
本当にenumで列挙しなきゃいけないの???
$ aws organizations list-accounts | jq -r '.Accounts[].Id'
111111111111
222222222222
333333333333
444444444444
555555555555
666666666666
777777777777
888888888888
999999999999
これでとって一個一個いれた
こんごAWSアカウント増えるたびにTableは作り直しです
'projection.accountid.type' = 'integer'
は連番じゃないとだめ???
おわりです