ここになんか見たことあるような階層の深い、大きなとても大きなjsonファイルがあります
ふと attributes
以下にどんな項目があるか気になりますよね?
{
"DQ578CGN99KG6ECF": {
"sku": "DQ578CGN99KG6ECF",
"productFamily": "Compute Instance",
"attributes": {
"servicecode": "AmazonEC2",
"location": "US East (N. Virginia)",
"locationType": "AWS Region",
"instanceType": "hs1.8xlarge",
"currentGeneration": "No",
"instanceFamily": "Storage optimized",
"vcpu": "17",
"physicalProcessor": "Intel Xeon E5-2650",
"clockSpeed": "2 GHz",
"memory": "117 GiB",
"storage": "24 x 2000",
"networkPerformance": "10 Gigabit",
"processorArchitecture": "64-bit",
"tenancy": "Shared",
"operatingSystem": "Windows",
"licenseModel": "License Included",
"usagetype": "BoxUsage:hs1.8xlarge",
"operation": "RunInstances:0002",
"preInstalledSw": "NA"
}
},
"RDXNGJU5DRW4G5ZK": {
"sku": "RDXNGJU5DRW4G5ZK",
"productFamily": "Compute Instance",
"attributes": {
"servicecode": "AmazonEC2",
"location": "South America (Sao Paulo)",
"locationType": "AWS Region",
"instanceType": "c3.large",
"currentGeneration": "Yes",
"instanceFamily": "Compute optimized",
"vcpu": "2",
"physicalProcessor": "Intel Xeon E5-2680 v2 (Ivy Bridge)",
// ・・・(省略)・・・
"operatingSystem": "SUSE",
"licenseModel": "No License required",
"usagetype": "APS1-DedicatedUsage:m3.2xlarge",
"operation": "RunInstances:000g",
"preInstalledSw": "NA",
"processorFeatures": "Intel AVX; Intel Turbo"
}
}
}
jsonとなると
jq
を使ってかっこ良くコマンドラインで抽出してみたくなりますよね?
ということでやってみました
cat ec2.json | jq '[.["products"][]["attributes"]|keys]|flatten|unique'
説明
1. まず全体からattributes以下の部分を抽出
これはまあノーマルな感じでどうぞ
cat ec2.json | jq '.["products"][]["attributes"]'
{
"servicecode": "AmazonEC2",
"location": "US East (N. Virginia)",
"locationType": "AWS Region",
"instanceType": "hs1.8xlarge",
"currentGeneration": "No",
"instanceFamily": "Storage optimized",
"vcpu": "17",
"physicalProcessor": "Intel Xeon E5-2650",
"clockSpeed": "2 GHz",
"memory": "117 GiB",
"storage": "24 x 2000",
"networkPerformance": "10 Gigabit",
"processorArchitecture": "64-bit",
"tenancy": "Shared",
"operatingSystem": "Windows",
"licenseModel": "License Included",
"usagetype": "BoxUsage:hs1.8xlarge",
"operation": "RunInstances:0002",
"preInstalledSw": "NA"
}
{
"servicecode": "AmazonEC2",
"location": "South America (Sao Paulo)",
"locationType": "AWS Region",
"instanceType": "c3.large",
"currentGeneration": "Yes",
"instanceFamily": "Compute optimized",
"vcpu": "2",
"physicalProcessor": "Intel Xeon E5-2680 v2 (Ivy Bridge)",
"clockSpeed": "2.8 GHz",
"memory": "3.75 GiB",
"storage": "2 x 16 SSD",
"networkPerformance": "Moderate",
// ・・・(省略)・・・
2. 次にキーのみを抽出
https://stedolan.github.io/jq/manual/
マニュアルを確認すると結構便利な演算子や関数が用意されてて感動します
その中で keys
というのを使うことでキーのみを抽出できます。
cat ec2.json | jq '.["products"][]["attributes"]|keys'
[
"clockSpeed",
"currentGeneration",
"instanceFamily",
"instanceType",
"licenseModel",
"location",
"locationType",
"memory",
"networkPerformance",
"operatingSystem",
"operation",
"physicalProcessor",
"preInstalledSw",
"processorArchitecture",
"servicecode",
"storage",
"tenancy",
"usagetype",
"vcpu"
]
[
"clockSpeed",
"currentGeneration",
"enhancedNetworkingSupported",
"instanceFamily",
"instanceType",
"licenseModel",
"location",
"locationType",
"memory",
"networkPerformance",
"operatingSystem",
"operation",
"physicalProcessor",
"preInstalledSw",
// ・・・(省略)・・・
3. 重複するキーを除去
単純に unique
を使えばいいかと思ったらそうは問屋がおろさず。。
そうか、その前に配列の階層を平坦化する必要があるんだな
ということで調べてみると、ご多分に漏れず flatten
という関数があったので使ってみたら、、うまくいきませんでした。
2.の結果をよーく見ると、出力結果全体は配列ではなく、複数の配列がある状態でした。これはそもそも1.でパターンマッチした箇所をそれぞれ抽出していて、全体でjsonとはなっていませんでした。(いままでずっと出力自体もjson形式だとばかり思ってた。。)
ということで、なってないなら配列にしてしまえばいい!
cat ec2.json | jq '[.["products"][]["attributes"]|keys]'
[
[
"clockSpeed",
"currentGeneration",
"instanceFamily",
"instanceType",
"licenseModel",
"location",
"locationType",
"memory",
"networkPerformance",
"operatingSystem",
"operation",
"physicalProcessor",
"preInstalledSw",
"processorArchitecture",
"servicecode",
"storage",
"tenancy",
"usagetype",
"vcpu"
],
[
"clockSpeed",
"currentGeneration",
"enhancedNetworkingSupported",
"instanceFamily",
"instanceType",
"licenseModel",
// ・・・(省略)・・・
4. 最後に平坦化してユニーク
すでに書いてますが、 flatten
とか unique
など便利な関数が用意されているのでそれを使うといい感じにやりたかったある階層のキーを抽出することができます。
cat ec2.json | jq '[.["products"][]["attributes"]|keys]|flatten|unique'
[
"clockSpeed",
"currentGeneration",
"dedicatedEbsThroughput",
"ebsOptimized",
"enhancedNetworkingSupported",
"fromLocation",
"fromLocationType",
"gpu",
"group",
"groupDescription",
"instanceCapacity10xlarge",
"instanceCapacity2xlarge",
"instanceCapacity4xlarge",
"instanceCapacity8xlarge",
"instanceCapacityLarge",
"instanceCapacityMedium",
"instanceCapacityXlarge",
"instanceFamily",
"instanceType",
"licenseModel",
// ・・・(省略)・・・
以上です。