1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

jqを使ってJSONの配列をElasticSearchのBulk API に投入できる形に変換する

Posted at

例えば以下のようなデータ郡をElasticSearchに投入したいとする

dataset.json
[
  {
    "isbn": "978-4-00-023745-1",
    "title": "それで君の声はどこにあるんだ?",
    "sub_title": "黒人神学から学んだこと",
    "author": "榎本空 著",
    "publisher": "岩波書店",
    "published_at": "2022.5",
    "series": "",
    "price": 2000
  },
  {
    "isbn": "978-4-00-061532-7",
    "title": "気候民主主義",
    "sub_title": "次世代の政治の動かし方",
    "author": "三上直之 著",
    "publisher": "岩波書店",
    "published_at": "2022.5",
    "series": "",
    "price": 2100
  },
  ...
]

複数データの投入はElasticSearchのBulk APIを使えば実現できるのだが、これはpostするデータのフォーマットが少々特殊なため加工してあげる必要がある。

具体的には以下のようなフォーマットが必要

bulk_dataset.json
{"index":{"_index":"books","_id":"978-4-00-023745-1"}}
{"isbn":"978-4-00-023745-1","title":"それで君の声はどこにあるんだ?","sub_title":"黒人神学から学んだこと","author":"榎本空 著","publisher":"岩波書店","published_at":"2022.5","series":"","price":2000}
{"index":{"_index":"books","_id":"978-4-00-061532-7"}}
{"isbn":"978-4-00-061532-7","title":"気候民主主義","sub_title":"次世代の政治の動かし方","author":"三上直之 著","publisher":"岩波書店","published_at":"2022.5","series":"","price":2100}
...

これはjqを使って以下のように指定してあげれば:ok:

$ cat dataset.json | jq -c '.[] | ({"index":{"_index": "books", "_id": .isbn}}, .)' > bulk_dataset.json

# curlで投入
$ curl -X PUT "http://localhost:9200/books/_bulk?pretty&refresh"  -H "Content-Type: application/json" --data-binary "@bulk_dataset.json"
1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?