Fluentd + BigQuery の Tweet データ収集システムに S3へのバックアップ機能を追加

Posted at 2015-06-28

はじめに

今回は、Fluentd + BigQuery による Tweet データ収集で作成した Fluentd + BigQuery の Tweet データ収集システムに、
S3へのログバックアップ機能を追加する。

これで、BigQuery側に障害が発生してデータが記録できなかった場合でも、
S3側に記録したログからの復旧が行える。

IAM User の作成

AWSのコンソールにログイン後、
Identity & Access Management -> Users -> Create New Users をクリック。
Enter User Names: 1. に fluentd を入力し、Create をクリック。
Show User Security Credentials をクリックし、Access Key Id と Secret Access Key を確認する。
Download Credentials をクリックすると、これらが記述されたファイルがダウンロードできる。

次に、作成したユーザー fluentd を選択し、
Permissions -> Managed Policies -> Attach Policies をクリック。
AmazonS3FullAccess を選択し、Attach Policy をクリック

これで他のプログラムから S3 を読み書きするための User が作成できた。

S3 バケットの作成

AWSの管理コンソールから、S3 -> Create Bucket をクリックし、
任意の名前のS3バケットを作成する。

fluentd の設定

td-agent を使用している場合、S3用の出力プラグイン fluent-plugin-s3 は
デフォルトでインストールされているので特にインストール等は必要ない。

まず下記コマンドでログ出力先のディレクトリを作成する。

$ sudo mkdir /var/log/td-agent/s3

次に /etc/td-agent/td-agent.conf の <match input.twitter.sampling> ~ </match> 部分に、
下記の <store> ~ </store> 部分を追記する。

<match input.twitter.sampling>
  type copy

  <store>
    type s3

    aws_key_id YOUR_AWS_KEY_ID
    aws_sec_key YOUR_AWS_SECRET_KEY
    s3_bucket YOUR_S3_BUCKET_NAME
    s3_region ap-northeast-1
    path streaming_api_logs/
    buffer_path /var/log/td-agent/s3

    time_slice_format %Y%m%d%H%M
    s3_object_key_format %{path}%{time_slice}_%{index}_%{hostname}.%{file_extension}
    time_slice_wait 5m
    utc

    buffer_chunk_limit 256m
  </store>

   ...

Fluentd の再起動

$ sudo service td-agent stop
$ sudo service td-agent start

指定した s3 バケットにデータが記録されているのが確認できればOK。

参考資料

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up