Edited at

jqコマンドでjsonから必要なデータのみを取得する

以前curlコマンドについて書いたのですが、curlコマンドとよく組み合わせてjqコマンドも使うのでまとめました。


jqコマンドとは?

jqコマンドはフィルターだとマニュアルに記載がありますね。

jsonをinputとして受け取り、様々な処理をした結果をアウトプットしてくれます。


A jq program is a “filter”: it takes an input, and produces an output.


マニュアルは英語なのですが、翻訳をしてくれている記事もありました。(古いですがあまり情報は変わっていない気がします。)

軽量JSONパーサー『jq』のドキュメント:『jq Manual』をざっくり日本語訳してみました

インストールはhomebrewから可能です。

brew install jq


jqコマンドでデータの整形をしてみる

郵便番号APIから取得したデータに対してjqコマンドで処理をしてみたいと思います。毎回APIを叩かないで良いようにechoでjsonを出力しています。

データはhttp://zipcloud.ibsnet.co.jp/api/search?zipcode=7600000から取得可能です。

$ echo '{"message": null,"results": [{"address1": "香川県","address2": "高松市","address3": "","kana1": "カガワケン","kana2": "タカマツシ","kana3": "","prefcode": "37","zipcode": "7600000"}],"status": 200}'

{"message": null,"results": [{"address1": "香川県","address2": "高松市","address3": "","kana1": "カガワケン","kana2": "タカマツシ","kana3": "","prefcode": "37","zipcode": "7600000"}],"status": 200}

# 見やすいように整形してくれる
$ echo '{"message": null,"results": [{"address1": "香川県","address2": "高松市","address3": "","kana1": "カガワケン","kana2": "タカマツシ","kana3": "","prefcode": "37","zipcode": "7600000"}],"status": 200}' \
| jq .

{
"message": null,
"results": [
{
"address1": "香川県",
"address2": "高松市",
"address3": "",
"kana1": "カガワケン",
"kana2": "タカマツシ",
"kana3": "",
"prefcode": "37",
"zipcode": "7600000"
}
],
"status": 200
}


必要なデータのみ取得する


特定のkeyに絞る

先ほどの出力結果からresultsのみを取得する場合、.keyと指定することで出力結果を絞ることが可能です。

$ echo '{"message": null,"results": [{"address1": "香川県","address2": "高松市","address3": "","kana1": "カガワケン","kana2": "タカマツシ","kana3": "","prefcode": "37","zipcode": "7600000"}],"status": 200}' \

| jq .results

[
{
"address1": "香川県",
"address2": "高松市",
"address3": "",
"kana1": "カガワケン",
"kana2": "タカマツシ",
"kana3": "",
"prefcode": "37",
"zipcode": "7600000"
}
]


keyだけを取得する

keyの一覧を取得したい場合

$ echo '{"message": null,"results": [{"address1": "香川県","address2": "高松市","address3": "","kana1": "カガワケン","kana2": "タカマツシ","kana3": "","prefcode": "37","zipcode": "7600000"}],"status": 200}' \

| jq '.|keys'

[
"message",
"results",
"status"
]


valueを取得する

香川県というデータだけを取りたい場合

$ echo '{"message": null,"results": [{"address1": "香川県","address2": "高松市","address3": "","kana1": "カガワケン","kana2": "タカマツシ","kana3": "","prefcode": "37","zipcode": "7600000"}],"status": 200}' \

| jq '.results[] | .address1'
"香川県"

.results[] とすると、要素ごとにパイプに渡すことになります。address1,address2,address3とフィルタに渡していって、指定された.address1のvalueを出力してくれています。

そのためこれはエラーになります。

$ echo '{"message": null,"results": [{"address1": "香川県","address2": "高松市","address3": "","kana1": "カガワケン","kana2": "タカマツシ","kana3": "","prefcode": "37","zipcode": "7600000"}],"status": 200}' \

| jq '.results | .address1'
jq: error (at <stdin>:1): Cannot index array with string "address1"

また、複数valueを取得したい場合には、カンマ区切りで取得可能です。

jq '.results[] | .address1, .address2


valueを検索する

先ほどのjsonの例だとわかりにくいので、少しデータを追加します。

$ echo '{"message": null,"results": [{"address1": "香川県","address2": "高松市","address3": "","kana1": "カガワケン","kana2": "タカマツシ","kana3": "","prefcode": "37","zipcode": "7600000"}, {"address1": "東京都","address2": "足立区","address3": "","kana1": "トウキョウト","kana2": "アダチク","kana3": "","prefcode": "13","zipcode": "1200000"}, {"address1": "千葉県","address2": "千葉市中央区","address3": "","kana1": "チバケン","kana2": "チバシチュウオウク","kana3": "","prefcode": "12","zipcode": "2600000"}],"status": 200}' \

| jq .
{
"message": null,
"results": [
{
"address1": "香川県",
"address2": "高松市",
"address3": "",
"kana1": "カガワケン",
"kana2": "タカマツシ",
"kana3": "",
"prefcode": "37",
"zipcode": "7600000"
},
{
"address1": "東京都",
"address2": "足立区",
"address3": "",
"kana1": "トウキョウト",
"kana2": "アダチク",
"kana3": "",
"prefcode": "13",
"zipcode": "1200000"
},
{
"address1": "千葉県",
"address2": "千葉市中央区",
"address3": "",
"kana1": "チバケン",
"kana2": "チバシチュウオウク",
"kana3": "",
"prefcode": "12",
"zipcode": "2600000"
}
],
"status": 200
}

selectでvalueの検索ができるため、上記のデータから千葉県のみ取得をしてみます。

$ echo '{"message": null,"results": [{"address1": "香川県","address2": "高松市","address3": "","kana1": "カガワケン","kana2": "タカマツシ","kana3": "","prefcode": "37","zipcode": "7600000"}, {"address1": "東京都","address2": "足立区","address3": "","kana1": "トウキョウト","kana2": "アダチク","kana3": "","prefcode": "13","zipcode": "1200000"}, {"address1": "千葉県","address2": "千葉市中央区","address3": "","kana1": "チバケン","kana2": "チバシチュウオウク","kana3": "","prefcode": "12","zipcode": "2600000"}],"status": 200}' \

| jq '.results[] | select(.address1 == "千葉県")'
{
"address1": "千葉県",
"address2": "千葉市中央区",
"address3": "",
"kana1": "チバケン",
"kana2": "チバシチュウオウク",
"kana3": "",
"prefcode": "12",
"zipcode": "2600000"
}


and/orで複数条件をつけることも可能

$ echo '{"message": null,"results": [{"address1": "香川県","address2": "高松市","address3": "","kana1": "カガワケン","kana2": "タカマツシ","kana3": "","prefcode": "37","zipcode": "7600000"}, {"address1": "東京都","address2": "足立区","address3": "","kana1": "トウキョウト","kana2": "アダチク","kana3": "","prefcode": "13","zipcode": "1200000"}, {"address1": "千葉県","address2": "千葉市中央区","address3": "","kana1": "チバケン","kana2": "チバシチュウオウク","kana3": "","prefcode": "12","zipcode": "2600000"}],"status": 200}' \

| jq '.results[] | select(.address1 == "千葉県" or .address1 == "東京都")'
{
"address1": "東京都",
"address2": "足立区",
"address3": "",
"kana1": "トウキョウト",
"kana2": "アダチク",
"kana3": "",
"prefcode": "13",
"zipcode": "1200000"
}
{
"address1": "千葉県",
"address2": "千葉市中央区",
"address3": "",
"kana1": "チバケン",
"kana2": "チバシチュウオウク",
"kana3": "",
"prefcode": "12",
"zipcode": "2600000"
}


取得したデータを整形して出力する


データを再形成する

最初のjsonを再度例にして、必要なデータのみを取得してみます。

パイプで渡した後に、.key で指定したkeyのvalueを指定可能です。

$ echo '{"message": null,"results": [{"address1": "香川県","address2": "高松市","address3": "","kana1": "カガワケン","kana2": "タカマツシ","kana3": "","prefcode": "37","zipcode": "7600000"}],"status": 200}' \

| jq '.results[] | { prefecture : .address1, zipcode: .zipcode }'
{
"prefecture": "香川県",
"zipcode": "7600000"
}

ただ、これをそのままkeyに指定することはできません。

$ echo '{"message": null,"results": [{"address1": "香川県","address2": "高松市","address3": "","kana1": "カガワケン","kana2": "タカマツシ","kana3": "","prefcode": "37","zipcode": "7600000"}],"status": 200}' \

| jq '.results[] | { .address1 : .address1, .zipcode: .zipcode }'
jq: error: syntax error, unexpected FIELD (Unix shell quoting issues?) at <top-level>, line 1:
.results[] | { .address1 : .address1, .zipcode: .zipcode }
jq: error: May need parentheses around object key expression at <top-level>, line 1:
.results[] | { .address1 : .address1, .zipcode: .zipcode }
jq: error: syntax error, unexpected FIELD (Unix shell quoting issues?) at <top-level>, line 1:
.results[] | { .address1 : .address1, .zipcode: .zipcode }
jq: error: May need parentheses around object key expression at <top-level>, line 1:
.results[] | { .address1 : .address1, .zipcode: .zipcode }
jq: 4 compile errors


取得した値をそのままkeyに使うには

(.key) とすることでkeyに使用することができます。

都道府県:カナ という形式でデータを取得してみます。

$ echo '{"message": null,"results": [{"address1": "香川県","address2": "高松市","address3": "","kana1": "カガワケン","kana2": "タカマツシ","kana3": "","prefcode": "37","zipcode": "7600000"}],"status": 200}' \

| jq '.results[] | { (.address1) : .kana1 }'

{
"香川県": "カガワケン"
}

(.address1) とすることでkeyに指定することができました。