OrganizationでCloudTrail有効化したときのPartition ProjectionをつかったAthenaテーブルをつくる

Posted at 2021-03-25

概要

OrganizationでCloudTrailを有効化すると、S3バケットのパスに組織IDが入る
Partitionを切りたいがOrganizationのいろいろなユーザにALTER TABLEでつどPatitionをロードを強いるのは辛い
なのでPartition Projectionをつかっていい感じにPartitionをきりたい

参考：https://dev.classmethod.jp/articles/cloudtrail-athena-partition-projection-table/
ちなみにほぼこれの内容を踏襲した記事です

S3バケットのパス

単一のAWSアカウントでCloudtrail有効化したときと違い組織IDが入るようになります

s3://{バケット名}/AWSLogs/{組織ID}/{アカウントID}/CloudTrail/

テーブル作成

CREATE EXTERNAL TABLE cloudtrail_logs (
    eventVersion STRING,
    userIdentity STRUCT<
        type: STRING,
        principalId: STRING,
        arn: STRING,
        accountId: STRING,
        invokedBy: STRING,
        accessKeyId: STRING,
        userName: STRING,
        sessionContext: STRUCT<
            attributes: STRUCT<
                mfaAuthenticated: STRING,
                creationDate: STRING>,
            sessionIssuer: STRUCT<
                type: STRING,
                principalId: STRING,
                arn: STRING,
                accountId: STRING,
                userName: STRING>>>,
    eventTime STRING,
    eventSource STRING,
    eventName STRING,
    awsRegion STRING,
    sourceIpAddress STRING,
    userAgent STRING,
    errorCode STRING,
    errorMessage STRING,
    requestParameters STRING,
    responseElements STRING,
    additionalEventData STRING,
    requestId STRING,
    eventId STRING,
    resources ARRAY<STRUCT<
        arn: STRING,
        accountId: STRING,
        type: STRING>>,
    eventType STRING,
    apiVersion STRING,
    readOnly STRING,
    recipientAccountId STRING,
    serviceEventDetails STRING,
    sharedEventID STRING,
    vpcEndpointId STRING
)
COMMENT 'CloudTrail table for ${BucketName} bucket'
PARTITIONED BY (
    accountid string,
    region string,
    date string
)
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'
STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://{バケット名}/AWSLogs/'
TBLPROPERTIES (
    'projection.enabled' = 'true',
    'projection.accountid.type' = 'enum',
    'projection.accountid.values' = '111111111111,222222222222,333333333333,4444444444444,555555555555,666666666666,777777777777,8888888888888,999999999999',
    'projection.date.type' = 'date',
    'projection.date.range' = 'NOW-1YEARS,NOW',
    'projection.date.format' = 'yyyy/MM/dd',
    'projection.date.interval' = '1',
    'projection.date.interval.unit' = 'DAYS',
    'projection.region.type' = 'enum',
    'projection.region.values'='us-east-1,us-east-2,us-west-1,us-west-2,af-south-1,ap-east-1,ap-south-1,ap-northeast-2,ap-southeast-1,ap-southeast-2,ap-northeast-1,ca-central-1,eu-central-1,eu-west-1,eu-west-2,eu-south-1,eu-west-3,eu-north-1,me-south-1,sa-east-1',
    'storage.location.template' = 's3://{バケット名}/AWSLogs/{組織ID}/${accountid}/CloudTrail/${region}/${date}',
    'compressionType'='gzip',
    'typeOfData'='file',
    'classification'='cloudtrail'
);

accountidのprojection つらすぎたすけて

TBLPROPERTIES (
    'projection.enabled' = 'true',
    'projection.accountid.type' = 'enum',
    'projection.accountid.values' = '111111111111,222222222222,333333333333,444444444444,555555555555,666666666666,777777777777,888888888888,999999999999',

本当にenumで列挙しなきゃいけないの？？？

$ aws organizations list-accounts | jq -r '.Accounts[].Id'
111111111111
222222222222
333333333333
444444444444
555555555555
666666666666
777777777777
888888888888
999999999999

これでとって一個一個いれた

こんごAWSアカウント増えるたびにTableは作り直しです

'projection.accountid.type' = 'integer'

は連番じゃないとだめ？？？

おわりです

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

OrganizationでCloudTrail有効化したときのPartition ProjectionをつかったAthenaテーブルをつくる

概要

S3バケットのパス

テーブル作成

accountidのprojection つらすぎ たすけて

accountidのprojection つらすぎたすけて