4
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

fastavroをつかったavroデータの書き込み

Last updated at Posted at 2018-02-13

BigQueryなどでデータを突っ込むときにavroファイルを作りたい。
avroだと自動スキーマ認識でネストしたテーブルが作れる。

avro(avro-python3)はJava likeなインタフェースなので、Cythonで書かれたfastavroを使いたい

サンプルコード

import fastavro

schema = {
    'name': 'Root',
    'type': 'record',
    'fields': [
        {'name': 'a', 'type': 'string'},
        {
            'name': 'b',
            'type': {
                'type': 'array',
                'items': {
                    'type': 'record',
                    'name': 'B',
                    'namespace': 'root',
                    'fields': [
                        {'name': 'c', 'type': 'string'},
                        {'name': 'd', 'type': ['double', 'null']},
                        {'name': 'e', 'type': ['null', {'type': 'array', 'items': 'int'}]},
                    ],
                },
            },
         },
    ],
}

with open('foobar.avro', 'wb') as out:  # 追記するときは mode='aw'
  writer = fastavro.write.Writer(out, schema, sync_interval=1000, codecs='deflate')

  # loop
  record = {'a': 'aaa', 'b': [{'c': 'ccc', 'd': 123, 'e': [1, 2, 3]}]}
  writer.write(record)

  # 最後にフラッシュしないと最後までデータが書き込まれない(sync_intervalごとに自動flush)
  writer.flush()

あとは、avscファイルやconfig.pyなどにスキーマ定義しておけばよい

参照

BQで対応可能なavroフォーマット

4
3
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?