avroの中に配列入れる方法がよくわからなかったのでメモ
これが参考になりました。
java - How to create schema containing list of objects using Avro? - Stack Overflow
こんな感じの親子関係を avroで定義したいとき、
class Child {
String id;
String name;
}
class Parent {
String name;
list<Child> children;
}
こんな感じに定義する。
schema.avsc
{
"name": "Parent",
"type":"record",
"fields":[
{
"name": "name",
"type": "string"
},
{
"name":"children",
"type":{
"type": "array",
"items":{
"name":"Child",
"type":"record",
"fields":[
{"name":"id", "type":"int"},
{"name":"name", "type":"string"}
]
}
}
}
]
}
データの生成はこんな感じ
import avro.schema
from avro.datafile import DataFileWriter
from avro.io import DatumWriter
with open("schema.avsc", "rt") as avsc:
schema = avro.schema.Parse(avsc.read())
with DataFileWriter(open("output.avro", "wb"), DatumWriter(), schema) as writer:
writer.append(
{
"name": "parent-name",
"children": [
{"id": 1, "name": "child-name-1"},
{"id": 2, "name": "child-name-2"},
{"id": 3, "name": "child-name-3"}
]
}
)
ファイルを読んでみる
avro-read.py
from avro.datafile import DataFileReader
from avro.io import DatumReader
import sys
file = sys.argv[1]
with DataFileReader(open(file, "rb"), DatumReader()) as reader:
for resp in reader:
print(resp)
$ python avro-read.py output.avro
{'name': 'parent-name', 'children': [{'id': 1, 'name': 'child-name-1'}, {'id': 2, 'name': 'child-name-2'}, {'id': 3, 'name': 'child-name-3'}]}