More than 1 year has passed since last update.

JSON linesをGoで処理する

Last updated at 2022-07-26Posted at 2021-07-01

個人的メモ。

AWs Kinesis Firehose や AWS Comprehend の出力するデータが、一行ごとに独立したJSONになっている。

ちなみにこういうフォーマット。

{"File": "reject_reason_new.csv", "KeyPhrases": [{"BeginOffset": 0, "EndOffset": 9, "Score": 0.9999968409885279, "Text": "\\n\u4e0a\u4e0b\u306e\u9ed2\u8272\u90e8\u5206"}, {"BeginOffset": 11, "EndOffset": 17, "Score": 1.0, "Text": "\u4ee5\u4e0b\u306e\u6642\u9593\u5e2f"}], "Line": 32}
{"File": "reject_reason_new.csv", "KeyPhrases": [], "Line": 33}
{"File": "reject_reason_new.csv", "KeyPhrases": [{"BeginOffset": 1, "EndOffset": 11, "Score": 0.9573030141390197, "Text": "n53\u79d2\u301c2\u520647\u79d2"}], "Line": 34}

これって JSON Stream っていっていいんだっけ? 日常的にはこういう読み出しやらないんだけど、たまーにやることがあるのでメモだけしておく。

json.NewDecoder をつかう。
ローカルファイルやS3 Objectからメモリに優しく読み取ること。

func parseJsonStream(stream io.Reader) {

	dec := json.NewDecoder(stream)
	for {
		logDataOneLine := KeyPhraseOneLine{}
		if err := dec.Decode(&logDataOneLine); err == io.EOF {
			break
		} else if err != nil {
			log.Error(err)
			continue
		}
                // なんかする

	}

}

// 1行ごとのJSON構造体
type KeyPhraseOneLine struct {
	File       string      `json:"File"`
	KeyPhrases []KeyPhrase `json:"KeyPhrases"`
	Line       int
}

type KeyPhrase struct {
	Score float64 `json:"Score"`
	Text  string  `json:"Text"`
}

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up