LoginSignup
0
0

More than 3 years have passed since last update.

OrganizationでCloudTrail有効化したときのPartition ProjectionをつかったAthenaテーブルをつくる

Posted at

概要

  • OrganizationでCloudTrailを有効化すると、S3バケットのパスに組織IDが入る
  • Partitionを切りたいがOrganizationのいろいろなユーザにALTER TABLEでつどPatitionをロードを強いるのは辛い
  • なのでPartition Projectionをつかっていい感じにPartitionをきりたい

参考:https://dev.classmethod.jp/articles/cloudtrail-athena-partition-projection-table/
ちなみにほぼこれの内容を踏襲した記事です

S3バケットのパス

単一のAWSアカウントでCloudtrail有効化したときと違い組織IDが入るようになります

s3://{バケット名}/AWSLogs/{組織ID}/{アカウントID}/CloudTrail/

テーブル作成

CREATE EXTERNAL TABLE cloudtrail_logs (
    eventVersion STRING,
    userIdentity STRUCT<
        type: STRING,
        principalId: STRING,
        arn: STRING,
        accountId: STRING,
        invokedBy: STRING,
        accessKeyId: STRING,
        userName: STRING,
        sessionContext: STRUCT<
            attributes: STRUCT<
                mfaAuthenticated: STRING,
                creationDate: STRING>,
            sessionIssuer: STRUCT<
                type: STRING,
                principalId: STRING,
                arn: STRING,
                accountId: STRING,
                userName: STRING>>>,
    eventTime STRING,
    eventSource STRING,
    eventName STRING,
    awsRegion STRING,
    sourceIpAddress STRING,
    userAgent STRING,
    errorCode STRING,
    errorMessage STRING,
    requestParameters STRING,
    responseElements STRING,
    additionalEventData STRING,
    requestId STRING,
    eventId STRING,
    resources ARRAY<STRUCT<
        arn: STRING,
        accountId: STRING,
        type: STRING>>,
    eventType STRING,
    apiVersion STRING,
    readOnly STRING,
    recipientAccountId STRING,
    serviceEventDetails STRING,
    sharedEventID STRING,
    vpcEndpointId STRING
)
COMMENT 'CloudTrail table for ${BucketName} bucket'
PARTITIONED BY (
    accountid string,
    region string,
    date string
)
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'
STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://{バケット名}/AWSLogs/'
TBLPROPERTIES (
    'projection.enabled' = 'true',
    'projection.accountid.type' = 'enum',
    'projection.accountid.values' = '111111111111,222222222222,333333333333,4444444444444,555555555555,666666666666,777777777777,8888888888888,999999999999',
    'projection.date.type' = 'date',
    'projection.date.range' = 'NOW-1YEARS,NOW',
    'projection.date.format' = 'yyyy/MM/dd',
    'projection.date.interval' = '1',
    'projection.date.interval.unit' = 'DAYS',
    'projection.region.type' = 'enum',
    'projection.region.values'='us-east-1,us-east-2,us-west-1,us-west-2,af-south-1,ap-east-1,ap-south-1,ap-northeast-2,ap-southeast-1,ap-southeast-2,ap-northeast-1,ca-central-1,eu-central-1,eu-west-1,eu-west-2,eu-south-1,eu-west-3,eu-north-1,me-south-1,sa-east-1',
    'storage.location.template' = 's3://{バケット名}/AWSLogs/{組織ID}/${accountid}/CloudTrail/${region}/${date}',
    'compressionType'='gzip',
    'typeOfData'='file',
    'classification'='cloudtrail'
);

accountidのprojection つらすぎ たすけて

TBLPROPERTIES (
    'projection.enabled' = 'true',
    'projection.accountid.type' = 'enum',
    'projection.accountid.values' = '111111111111,222222222222,333333333333,444444444444,555555555555,666666666666,777777777777,888888888888,999999999999',

本当にenumで列挙しなきゃいけないの???

$ aws organizations list-accounts | jq -r '.Accounts[].Id'
111111111111
222222222222
333333333333
444444444444
555555555555
666666666666
777777777777
888888888888
999999999999

これでとって一個一個いれた

こんごAWSアカウント増えるたびにTableは作り直しです

'projection.accountid.type' = 'integer'

は連番じゃないとだめ???

おわりです

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0