More than 5 years have passed since last update.

Botoを使ってS3のCloudTrailのLogを解析する

Last updated at 2015-07-15Posted at 2015-07-14

初めに

Boto(AWSのPython SDK)を使って、S3上にあるCloudTrailのlogをParseしたい、と思ったのですが、幾つかハマッた箇所があったのと、自分にずばり参考になる取っ掛かりのSample Codeがうまく見つからなかったので、ご参考までにCodeをチラシの裏しておきます。

ハマッた箇所

Boto3はHTTP Proxy経由でのAccessをサポートしていない? っぽく、結局Boto2を使いました
CloudTrailのlogはDefaultでgz圧縮されているので、中身をparseするには解凍処理が必要
『S3のxxってBucketのaa/bb/ccのPath以下のfileだけを対象』という場合はbucket.list(prefix='aa/bb/cc')

なお、CloudTrailのlogは、Logglyを使うと解析が捗るみたいです。

Python Codeについて

個別に設定が必要なもの

aws_access_key_id, aws_secret_access_key,target_path, proxy, proxy_portが個別設定必要です。

aws_access_key_id, aws_secret_access_keyは、AWS外部アクセス用に作成したIAM Userのkeyを設定してください(作成したIAM UserにS3へのReadAccess権(AmazonS3ReadOnlyAccess)の付与を忘れずに!)
proxy, proxy_portはHTTP Proxy serverの設定です
target_bucketは対象のbucket名、target_pathは解析の対象としたいS3のpathです、ここではus-west-2の2015/07のlogのみを対象としみました

補足

以下のSample codeのaccess keyなどは適当な文字列にしてます
Python2.7, Boto v2.32.1を使っています

Codeの大枠の流れ

以下の様な処理の流れになっています。

target_bucketBucketのtarget_path以下にあるfile(Key)をListし、それらをDownloadする
Downloadされたfile(gz形式)を解凍し、JSONとして読み込む
RDS関連のLogが合った場合(['eventSource'] == 'rds.amazonaws.com')、中身を標準出力に出す


import boto.s3.connection, gzip, StringIO, json

aws_access_key_id='AKKBUGOIU4434DDTT'
aws_secret_access_key='78oiupoiuh7++REugoiusGSEE'
target_bucket = 'your-backet-name'
target_path = 'CroudTrail/AWSLogs/1234567899999888/CloudTrail/us-west-2/2015/07'

def main():
  s3Instance = boto.s3.connection.S3Connection \
    (aws_access_key_id, aws_secret_access_key, proxy='your.proxy.server.com', proxy_port=8080)
  s3Bucket   = s3Instance.get_bucket(target_bucket)
  bucketList = s3Bucket.list(prefix=target_path)

  for count, itemOne in enumerate(bucketList):
    s3BucketKey = s3Bucket.get_key(itemOne.name)
    buffer_gz = s3BucketKey.get_contents_as_string()
    stringBuffer = StringIO.StringIO(buffer_gz)
    buffer_text = gzip.GzipFile(fileobj=stringBuffer)

    try:
      responseJSON = json.loads(buffer_text.read())
    except Exception, e:
      print e
    else:
      for count, itemTwo in enumerate(responseJSON['Records']):
        if itemTwo['eventSource'] == 'rds.amazonaws.com':
          print json.dumps(itemTwo, separators=(',', ':'), indent=2)
          print 'Event name = %s' % (itemTwo['eventName'])
          print '================================='

    stringBuffer.close()
    buffer_text.close()

if __name__ == '__main__':
  main()

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up