Posted at

List of objects in avro format

More than 1 year has passed since last update.

avroの中に配列入れる方法がよくわからなかったのでメモ

これが参考になりました。

java - How to create schema containing list of objects using Avro? - Stack Overflow

こんな感じの親子関係を avroで定義したいとき、

class Child {

String id;
String name;
}

class Parent {
String name;
list<Child> children;
}

こんな感じに定義する。

schema.avsc

{

"name": "Parent",
"type":"record",
"fields":[
{
"name": "name",
"type": "string"
},
{
"name":"children",
"type":{
"type": "array",
"items":{
"name":"Child",
"type":"record",
"fields":[
{"name":"id", "type":"int"},
{"name":"name", "type":"string"}
]
}
}
}
]
}

データの生成はこんな感じ

import avro.schema

from avro.datafile import DataFileWriter
from avro.io import DatumWriter

with open("schema.avsc", "rt") as avsc:
schema = avro.schema.Parse(avsc.read())

with DataFileWriter(open("output.avro", "wb"), DatumWriter(), schema) as writer:
writer.append(
{
"name": "parent-name",
"children": [
{"id": 1, "name": "child-name-1"},
{"id": 2, "name": "child-name-2"},
{"id": 3, "name": "child-name-3"}
]
}
)

ファイルを読んでみる

avro-read.py


from avro.datafile import DataFileReader
from avro.io import DatumReader
import sys

file = sys.argv[1]

with DataFileReader(open(file, "rb"), DatumReader()) as reader:
for resp in reader:
print(resp)

$ python avro-read.py output.avro

{'name': 'parent-name', 'children': [{'id': 1, 'name': 'child-name-1'}, {'id': 2, 'name': 'child-name-2'}, {'id': 3, 'name': 'child-name-3'}]}