More than 5 years have passed since last update.

[jq] JSONデータの特定階層にあるキー名のユニーク値を取得

Last updated at 2016-02-08Posted at 2016-02-08

ここになんか見たことあるような階層の深い、大きなとても大きなjsonファイルがあります
ふと attributes 以下にどんな項目があるか気になりますよね？

{
  "DQ578CGN99KG6ECF": {
    "sku": "DQ578CGN99KG6ECF",
    "productFamily": "Compute Instance",
    "attributes": {
      "servicecode": "AmazonEC2",
      "location": "US East (N. Virginia)",
      "locationType": "AWS Region",
      "instanceType": "hs1.8xlarge",
      "currentGeneration": "No",
      "instanceFamily": "Storage optimized",
      "vcpu": "17",
      "physicalProcessor": "Intel Xeon E5-2650",
      "clockSpeed": "2 GHz",
      "memory": "117 GiB",
      "storage": "24 x 2000",
      "networkPerformance": "10 Gigabit",
      "processorArchitecture": "64-bit",
      "tenancy": "Shared",
      "operatingSystem": "Windows",
      "licenseModel": "License Included",
      "usagetype": "BoxUsage:hs1.8xlarge",
      "operation": "RunInstances:0002",
      "preInstalledSw": "NA"
    }
  },
  "RDXNGJU5DRW4G5ZK": {
    "sku": "RDXNGJU5DRW4G5ZK",
    "productFamily": "Compute Instance",
    "attributes": {
      "servicecode": "AmazonEC2",
      "location": "South America (Sao Paulo)",
      "locationType": "AWS Region",
      "instanceType": "c3.large",
      "currentGeneration": "Yes",
      "instanceFamily": "Compute optimized",
      "vcpu": "2",
      "physicalProcessor": "Intel Xeon E5-2680 v2 (Ivy Bridge)",
// ・・・（省略）・・・
      "operatingSystem": "SUSE",
      "licenseModel": "No License required",
      "usagetype": "APS1-DedicatedUsage:m3.2xlarge",
      "operation": "RunInstances:000g",
      "preInstalledSw": "NA",
      "processorFeatures": "Intel AVX; Intel Turbo"
    }
  }
}

jsonとなると

jq を使ってかっこ良くコマンドラインで抽出してみたくなりますよね？
ということでやってみました

cat ec2.json | jq '[.["products"][]["attributes"]|keys]|flatten|unique'

説明

1. まず全体からattributes以下の部分を抽出

これはまあノーマルな感じでどうぞ

cat ec2.json | jq '.["products"][]["attributes"]'

{
  "servicecode": "AmazonEC2",
  "location": "US East (N. Virginia)",
  "locationType": "AWS Region",
  "instanceType": "hs1.8xlarge",
  "currentGeneration": "No",
  "instanceFamily": "Storage optimized",
  "vcpu": "17",
  "physicalProcessor": "Intel Xeon E5-2650",
  "clockSpeed": "2 GHz",
  "memory": "117 GiB",
  "storage": "24 x 2000",
  "networkPerformance": "10 Gigabit",
  "processorArchitecture": "64-bit",
  "tenancy": "Shared",
  "operatingSystem": "Windows",
  "licenseModel": "License Included",
  "usagetype": "BoxUsage:hs1.8xlarge",
  "operation": "RunInstances:0002",
  "preInstalledSw": "NA"
}
{
  "servicecode": "AmazonEC2",
  "location": "South America (Sao Paulo)",
  "locationType": "AWS Region",
  "instanceType": "c3.large",
  "currentGeneration": "Yes",
  "instanceFamily": "Compute optimized",
  "vcpu": "2",
  "physicalProcessor": "Intel Xeon E5-2680 v2 (Ivy Bridge)",
  "clockSpeed": "2.8 GHz",
  "memory": "3.75 GiB",
  "storage": "2 x 16 SSD",
  "networkPerformance": "Moderate",
// ・・・（省略）・・・

2. 次にキーのみを抽出

https://stedolan.github.io/jq/manual/
マニュアルを確認すると結構便利な演算子や関数が用意されてて感動します
その中で keys というのを使うことでキーのみを抽出できます。

cat ec2.json | jq '.["products"][]["attributes"]|keys'

[
  "clockSpeed",
  "currentGeneration",
  "instanceFamily",
  "instanceType",
  "licenseModel",
  "location",
  "locationType",
  "memory",
  "networkPerformance",
  "operatingSystem",
  "operation",
  "physicalProcessor",
  "preInstalledSw",
  "processorArchitecture",
  "servicecode",
  "storage",
  "tenancy",
  "usagetype",
  "vcpu"
]
[
  "clockSpeed",
  "currentGeneration",
  "enhancedNetworkingSupported",
  "instanceFamily",
  "instanceType",
  "licenseModel",
  "location",
  "locationType",
  "memory",
  "networkPerformance",
  "operatingSystem",
  "operation",
  "physicalProcessor",
  "preInstalledSw",
// ・・・（省略）・・・

3. 重複するキーを除去

単純に unique を使えばいいかと思ったらそうは問屋がおろさず。。
そうか、その前に配列の階層を平坦化する必要があるんだな
ということで調べてみると、ご多分に漏れず flatten という関数があったので使ってみたら、、うまくいきませんでした。

2.の結果をよーく見ると、出力結果全体は配列ではなく、複数の配列がある状態でした。これはそもそも1.でパターンマッチした箇所をそれぞれ抽出していて、全体でjsonとはなっていませんでした。（いままでずっと出力自体もjson形式だとばかり思ってた。。）

ということで、なってないなら配列にしてしまえばいい！

cat ec2.json | jq '[.["products"][]["attributes"]|keys]'

[
  [
    "clockSpeed",
    "currentGeneration",
    "instanceFamily",
    "instanceType",
    "licenseModel",
    "location",
    "locationType",
    "memory",
    "networkPerformance",
    "operatingSystem",
    "operation",
    "physicalProcessor",
    "preInstalledSw",
    "processorArchitecture",
    "servicecode",
    "storage",
    "tenancy",
    "usagetype",
    "vcpu"
  ],
  [
    "clockSpeed",
    "currentGeneration",
    "enhancedNetworkingSupported",
    "instanceFamily",
    "instanceType",
    "licenseModel",
// ・・・（省略）・・・

4. 最後に平坦化してユニーク

すでに書いてますが、 flatten とか unique など便利な関数が用意されているのでそれを使うといい感じにやりたかったある階層のキーを抽出することができます。

cat ec2.json | jq '[.["products"][]["attributes"]|keys]|flatten|unique'

[
  "clockSpeed",
  "currentGeneration",
  "dedicatedEbsThroughput",
  "ebsOptimized",
  "enhancedNetworkingSupported",
  "fromLocation",
  "fromLocationType",
  "gpu",
  "group",
  "groupDescription",
  "instanceCapacity10xlarge",
  "instanceCapacity2xlarge",
  "instanceCapacity4xlarge",
  "instanceCapacity8xlarge",
  "instanceCapacityLarge",
  "instanceCapacityMedium",
  "instanceCapacityXlarge",
  "instanceFamily",
  "instanceType",
  "licenseModel",
// ・・・（省略）・・・

以上です。

jq素敵！！
https://stedolan.github.io/jq/manual/

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up