使われてなさそうなアクセスキーを削除したいけど、ほんとに使われてないのか自信ないので、CloudTrail のログを集計しようかなと思って調べたら、Hive の神 SerDe を見つけたのでメモ。このブログに書いてあることをやってるだけ。
hive> CREATE EXTERNAL TABLE IF NOT EXISTS cloud_trail_20150120
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailLogDeserializer'
STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://cloudtrail/AWSLogs/xxxxxxxxxxxx/CloudTrail/ap-northeast-1/2015/01/20/'
;
hive> DESC cloud_trail_20150120;
OK
eventversion string from deserializer
useridentity struct<type:string,principalid:string,arn:string,accountid:string,invokedby:string,accesskeyid:string,sessioncontext:struct<attributes:struct<mfaauthenticated:string,creationdate:string>>> from deserializer
eventtime string from deserializer
eventsource string from deserializer
eventname string from deserializer
awsregion string from deserializer
sourceipaddress string from deserializer
useragent string from deserializer
requestid string from deserializer
eventid string from deserializer
Time taken: 0.047 seconds, Fetched: 10 row(s)
hive> SELECT useridentity.accesskeyid, count(*)
FROM cloud_trail_20150120
GROUP BY useridentity.accesskeyid
ORDER BY useridentity.accesskeyid
;
XXXXXXXXXXXXXXXXXXXX 2
XXXXXXXXXXXXXXXXXXXX 1777
hive> SELECT useragent, count(*)
FROM cloud_trail_20150120
GROUP BY useragent
ORDER BY useragent
;
Amazon CLI/ElasticLoadBalancing 1.0.17.0 API 2012-06-01 2
Boto/2.25.0 Python/2.6.9 Linux/3.4.76-65.111.amzn1.x86_64 1777
...