Help us understand the problem. What is going on with this article?

[jq] JSONデータの特定階層にあるキー名のユニーク値を取得

More than 3 years have passed since last update.

ここになんか見たことあるような階層の深い、大きなとても大きなjsonファイルがあります
ふと attributes 以下にどんな項目があるか気になりますよね?

{
  "DQ578CGN99KG6ECF": {
    "sku": "DQ578CGN99KG6ECF",
    "productFamily": "Compute Instance",
    "attributes": {
      "servicecode": "AmazonEC2",
      "location": "US East (N. Virginia)",
      "locationType": "AWS Region",
      "instanceType": "hs1.8xlarge",
      "currentGeneration": "No",
      "instanceFamily": "Storage optimized",
      "vcpu": "17",
      "physicalProcessor": "Intel Xeon E5-2650",
      "clockSpeed": "2 GHz",
      "memory": "117 GiB",
      "storage": "24 x 2000",
      "networkPerformance": "10 Gigabit",
      "processorArchitecture": "64-bit",
      "tenancy": "Shared",
      "operatingSystem": "Windows",
      "licenseModel": "License Included",
      "usagetype": "BoxUsage:hs1.8xlarge",
      "operation": "RunInstances:0002",
      "preInstalledSw": "NA"
    }
  },
  "RDXNGJU5DRW4G5ZK": {
    "sku": "RDXNGJU5DRW4G5ZK",
    "productFamily": "Compute Instance",
    "attributes": {
      "servicecode": "AmazonEC2",
      "location": "South America (Sao Paulo)",
      "locationType": "AWS Region",
      "instanceType": "c3.large",
      "currentGeneration": "Yes",
      "instanceFamily": "Compute optimized",
      "vcpu": "2",
      "physicalProcessor": "Intel Xeon E5-2680 v2 (Ivy Bridge)",
// ・・・(省略)・・・
      "operatingSystem": "SUSE",
      "licenseModel": "No License required",
      "usagetype": "APS1-DedicatedUsage:m3.2xlarge",
      "operation": "RunInstances:000g",
      "preInstalledSw": "NA",
      "processorFeatures": "Intel AVX; Intel Turbo"
    }
  }
}

jsonとなると

jq を使ってかっこ良くコマンドラインで抽出してみたくなりますよね?
ということでやってみました

cat ec2.json | jq '[.["products"][]["attributes"]|keys]|flatten|unique'

説明

1. まず全体からattributes以下の部分を抽出

これはまあノーマルな感じでどうぞ

cat ec2.json | jq '.["products"][]["attributes"]'
{
  "servicecode": "AmazonEC2",
  "location": "US East (N. Virginia)",
  "locationType": "AWS Region",
  "instanceType": "hs1.8xlarge",
  "currentGeneration": "No",
  "instanceFamily": "Storage optimized",
  "vcpu": "17",
  "physicalProcessor": "Intel Xeon E5-2650",
  "clockSpeed": "2 GHz",
  "memory": "117 GiB",
  "storage": "24 x 2000",
  "networkPerformance": "10 Gigabit",
  "processorArchitecture": "64-bit",
  "tenancy": "Shared",
  "operatingSystem": "Windows",
  "licenseModel": "License Included",
  "usagetype": "BoxUsage:hs1.8xlarge",
  "operation": "RunInstances:0002",
  "preInstalledSw": "NA"
}
{
  "servicecode": "AmazonEC2",
  "location": "South America (Sao Paulo)",
  "locationType": "AWS Region",
  "instanceType": "c3.large",
  "currentGeneration": "Yes",
  "instanceFamily": "Compute optimized",
  "vcpu": "2",
  "physicalProcessor": "Intel Xeon E5-2680 v2 (Ivy Bridge)",
  "clockSpeed": "2.8 GHz",
  "memory": "3.75 GiB",
  "storage": "2 x 16 SSD",
  "networkPerformance": "Moderate",
// ・・・(省略)・・・

2. 次にキーのみを抽出

https://stedolan.github.io/jq/manual/
マニュアルを確認すると結構便利な演算子や関数が用意されてて感動します
その中で keys というのを使うことでキーのみを抽出できます。

cat ec2.json | jq '.["products"][]["attributes"]|keys'
[
  "clockSpeed",
  "currentGeneration",
  "instanceFamily",
  "instanceType",
  "licenseModel",
  "location",
  "locationType",
  "memory",
  "networkPerformance",
  "operatingSystem",
  "operation",
  "physicalProcessor",
  "preInstalledSw",
  "processorArchitecture",
  "servicecode",
  "storage",
  "tenancy",
  "usagetype",
  "vcpu"
]
[
  "clockSpeed",
  "currentGeneration",
  "enhancedNetworkingSupported",
  "instanceFamily",
  "instanceType",
  "licenseModel",
  "location",
  "locationType",
  "memory",
  "networkPerformance",
  "operatingSystem",
  "operation",
  "physicalProcessor",
  "preInstalledSw",
// ・・・(省略)・・・

3. 重複するキーを除去

単純に unique を使えばいいかと思ったらそうは問屋がおろさず。。
そうか、その前に配列の階層を平坦化する必要があるんだな
ということで調べてみると、ご多分に漏れず flatten という関数があったので使ってみたら、、うまくいきませんでした。

2.の結果をよーく見ると、出力結果全体は配列ではなく、複数の配列がある状態でした。これはそもそも1.でパターンマッチした箇所をそれぞれ抽出していて、全体でjsonとはなっていませんでした。(いままでずっと出力自体もjson形式だとばかり思ってた。。)

ということで、なってないなら配列にしてしまえばいい!

cat ec2.json | jq '[.["products"][]["attributes"]|keys]'
[
  [
    "clockSpeed",
    "currentGeneration",
    "instanceFamily",
    "instanceType",
    "licenseModel",
    "location",
    "locationType",
    "memory",
    "networkPerformance",
    "operatingSystem",
    "operation",
    "physicalProcessor",
    "preInstalledSw",
    "processorArchitecture",
    "servicecode",
    "storage",
    "tenancy",
    "usagetype",
    "vcpu"
  ],
  [
    "clockSpeed",
    "currentGeneration",
    "enhancedNetworkingSupported",
    "instanceFamily",
    "instanceType",
    "licenseModel",
// ・・・(省略)・・・

4. 最後に平坦化してユニーク

すでに書いてますが、 flatten とか unique など便利な関数が用意されているのでそれを使うといい感じにやりたかったある階層のキーを抽出することができます。

cat ec2.json | jq '[.["products"][]["attributes"]|keys]|flatten|unique'
[
  "clockSpeed",
  "currentGeneration",
  "dedicatedEbsThroughput",
  "ebsOptimized",
  "enhancedNetworkingSupported",
  "fromLocation",
  "fromLocationType",
  "gpu",
  "group",
  "groupDescription",
  "instanceCapacity10xlarge",
  "instanceCapacity2xlarge",
  "instanceCapacity4xlarge",
  "instanceCapacity8xlarge",
  "instanceCapacityLarge",
  "instanceCapacityMedium",
  "instanceCapacityXlarge",
  "instanceFamily",
  "instanceType",
  "licenseModel",
// ・・・(省略)・・・

以上です。

jq素敵!!
https://stedolan.github.io/jq/manual/

Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away