More than 3 years have passed since last update.

JSON形式のデータをCSV形式に変換するPythonスクリプト

Last updated at 2022-01-27Posted at 2022-01-23

はじめに

JSONファイルをCSV変換するスクリプトの作成メモ
作成したスクリプトは2種類

サンプル④-1：JSONファイルをCSV形式に変換してファイル出力（１）
- 04_1_convertFromJsonToCsv.py
サンプル④-2：JSONファイルをCSV形式に変換してファイル出力（２）
- 04_2_convertFromJsonToCsv.py

変換規則

	サンプル④-1	サンプル④-2
スクリプト	04_1_convertFromJsonToCsv.py	04_2_convertFromJsonToCsv.py
CSV変換後のカラム数	`JSONの要素`と`要素の値`の2つ	`JSONの要素`と`要素の値`の2つ
カラム`JSONの要素`について	先頭に番号を付与ネストする全ての要素を `__` で連結	先頭に番号を付与しないネストする全ての要素を `__` で連結
カラム`要素の値`について	JSON要素の値と同じ	作成した`JSONの要素`と一致する値が複数あれば、それらを `__` で連結

変換対象のJSONファイルをスクリプト実行時に指定できるようオプション指定するなどの作りこみは未実施。

サンプル①：引数の設定

argparseを使った引数の設定。詳しいことはこちらに記載ありました

01_sample_args.py

import argparse

help_desc_msg ="""－－－コマンドヘルプ－－－
ヘルプ１行目
ヘルプ２行目　
ヘルプ３行目
ヘルプ４行目
ヘルプ５行目
実行例）python sample_args.py -f FUGA -p PIYO
－－－－－－－－－－－－－"""

parser = argparse.ArgumentParser(description=help_desc_msg,formatter_class=argparse.RawDescriptionHelpFormatter)

parser.add_argument('-f','--fuga', help='fugaオプションの説明')
parser.add_argument('-p','--piyo', help='piyoオプションの説明')

args = parser.parse_args()

print('fuga='+args.fuga)
print('piyo='+args.piyo)

実行結果

$ python 01_sample_args.py -f FUGA -p PIYO
fuga=FUGA
piyo=PIYO
$ 
$ python 01_sample_args.py -h
usage: sample_args.py [-h] [-f FUGA] [-p PIYO]

－－－コマンドヘルプ－－－
ヘルプ１行目
ヘルプ２行目
ヘルプ３行目
ヘルプ４行目
ヘルプ５行目
実行例）python sample_args.py -f FUGA -p PIYO
－－－－－－－－－－－－－

optional arguments:
  -h, --help            show this help message and exit
  -f FUGA, --fuga FUGA  fugaオプションの説明
  -p PIYO, --piyo PIYO  piyoオプションの説明

サンプル②：SDK実行結果をJSON形式でファイル出力

AWSリソースの情報をSDK(boto3)で取得し、その結果をJSON形式でファイル出力する

02_outInfo_Vpc.py

import boto3
import json

client = boto3.client('ec2')
filters = [{'Name': 'tag:Name', 'Values': ['default'] }]

response = client.describe_vpcs( Filters=filters )

with open('out_vpc.json','w') as fp:
    data = json.dump(response, fp)

実行結果

$ python 02_outInfo_Vpc.py

出力ファイル(out_vpc.json)

out_vpc.json

{"Vpcs": [], "ResponseMetadata": {"RequestId": "00000000-0000-0000-0000-000000000000", "HTTPStatusCode": 200, "HTTPHeaders": {"x-amzn-requestid": "00000000-0000-0000-0000-000000000000", "cache-control": "no-cache, no-store", "strict-transport-security": "max-age=31536000; includeSubDomains", "content-type": "text/xml;charset=UTF-8", "content-length": "212", "date": "Sat, 21 Jan 2022 00:11:16 GMT", "server": "AmazonEC2"}, "RetryAttempts": 0}}

サンプル③：JSONファイルの読み込み

サンプル②で作成したJSONファイルを読み込みます。

03_loadjson.py

import json

json_open = open('out_vpc.json', 'r')
json_load = json.load(json_open)

print( type(json_open) )
print( type(json_load) )
print(json_load)

実行結果

$ python 03_loadjson.py
<class '_io.TextIOWrapper'>
<class 'dict'>
{'Vpcs': [{'CidrBlock': '172.31.0.0/16', 'DhcpOptionsId': 'dopt-00000000', 'State': 'available', 'VpcId': 'vpc-00000000', 'OwnerId': '000000000000', 'InstanceTenancy': 'default', 'CidrBlockAssociationSet': [{'AssociationId': 'vpc-cidr-assoc-00000000', 'CidrBlock': '172.31.0.0/16', 'CidrBlockState': {'State': 'associated'}}], 'IsDefault': True, 'Tags': [{'Key': 'Name', 'Value': 'default'}]}], 'ResponseMetadata': {'RequestId': '00000000-0000-0000-0000-000000000000', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '00000000-0000-0000-0000-000000000000', 'cache-control': 'no-cache, no-store', 'strict-transport-security': 'max-age=31536000; includeSubDomains', 'content-type': 'text/xml;charset=UTF-8', 'content-length': '1128', 'date': 'Sat, 22 Jan 2022 08:19:48 GMT', 'server': 'AmazonEC2'}, 'RetryAttempts': 0}}

サンプル④-1：JSONファイルをCSV形式に変換してファイル出力（１）

ネストされたJSONファイルをCSV形式に変換しファイル出力します。CSV変換するサンプルは２種類作成しています。

04_1_convertFromJsonToCsv.py

import json
import csv

# fileIn = "out_vpc.json"
# fileOut= "out_vpc.csv"

fileIn = "sample_0.json"
fileOut= "sample_0.csv"


json_open = open(fileIn, 'r')
json_load = json.load(json_open)

outDict = {}
glNo = 0

def nestLoop(el,argStr):
    global glNo
    headStr = ""
    nextStr = ""
    if type(el) is str: # 文字列
        nowNo = f'{glNo:08}'
        outElement = "{0}___{1}".format(nowNo,argStr)
        glNo += 1
        outDict.update( { outElement : el } )
    if type(el) is list: # リスト型
        for lp in el:
            nextStr = "{0}{1}".format(argStr , headStr)
            nestLoop( lp , nextStr)
            nextStr = ""
    if type(el) is dict: # 辞書型
        for lp in el.items():
            if argStr != "":
                headStr = "__"
            nextStr += "{0}{1}{2}".format( argStr , headStr , lp[0].strip() )
            nestLoop( el[ lp[0] ] , nextStr)
            nextStr = ""

nestLoop(json_load,"")

with open(r'{0}'.format(fileOut),'w',encoding='utf-8',newline="") as f:
    writer = csv.writer(f)
    for k, v in outDict.items():
        writer.writerow([k, v])

実行結果:

$ python 04_1_convertFromJsonToCsv.py

サンプル②で作成したJSONファイルをCSV形式に変換した結果

out_vpc.csv

00000000___Vpcs__CidrBlock,172.31.0.0/16
00000001___Vpcs__DhcpOptionsId,dopt-00000000
00000002___Vpcs__State,available
00000003___Vpcs__VpcId,vpc-00000000
00000004___Vpcs__OwnerId,000000000000
00000005___Vpcs__InstanceTenancy,default
00000006___Vpcs__CidrBlockAssociationSet__AssociationId,vpc-cidr-assoc-00000000
00000007___Vpcs__CidrBlockAssociationSet__CidrBlock,172.31.0.0/16
00000008___Vpcs__CidrBlockAssociationSet__CidrBlockState__State,associated
00000009___Vpcs__Tags__Key,Name
00000010___Vpcs__Tags__Value,default
00000011___ResponseMetadata__RequestId,00000000-0000-0000-0000-000000000000
00000012___ResponseMetadata__HTTPHeaders__x-amzn-requestid,00000000-0000-0000-0000-000000000000
00000013___ResponseMetadata__HTTPHeaders__cache-control,"no-cache, no-store"
00000014___ResponseMetadata__HTTPHeaders__strict-transport-security,max-age=31536000; includeSubDomains
00000015___ResponseMetadata__HTTPHeaders__content-type,text/xml;charset=UTF-8
00000016___ResponseMetadata__HTTPHeaders__content-length,1128
00000017___ResponseMetadata__HTTPHeaders__date,"Sat, 22 Jan 2022 08:19:48 GMT"
00000018___ResponseMetadata__HTTPHeaders__server,AmazonEC2

04_1_convertFromJsonToCsv.pyで以下のjson(sample.json)と、そのサンプルをcsv変換した結果(out_sample.csv)

sample.json

{
    "test01": "TEST01",
    "test02": [
        {
            "test02-A": "VALUE-TEST02-A",
            "test02-B": [
                {
                    "test02-B-1": "VALUE-TEST02-B-1",
                    "test02-B-2": {
                        "test02-B-2-1": "VALUE-TEST02-B-2-1",
                        "test02-B-2-2": "VALUE-TEST02-B-2-2",
                        "test02-B-2-3": "VALUE-TEST02-B-2-3"
                    }
                }
            ],
            "test02-C": [
                { "test02-C-1-1": "VALUE-TEST02-C-1-1", "test02-C-1-2": "VALUE-TEST02-C-1-2" },
                { "test02-C-2-1": "VALUE-TEST02-C-2-1", "test02-C-2-2": "VALUE-TEST02-C-2-2" },
                { "test02-C-3-1": "VALUE-TEST02-C-3-1", "test02-C-3-2": "VALUE-TEST02-C-3-2" },
                { "test02-C-4-1": "VALUE-TEST02-C-4-1", "test02-C-4-2": "VALUE-TEST02-C-4-2" },
                { "test02-C-5-1": "VALUE-TEST02-C-5-1", "test02-C-5-2": "VALUE-TEST02-C-5-2" }
            ]
        }
    ],
    "test03": [ "VALUE-TEST03-1","VALUE-TEST03-2","VALUE-TEST03-3","VALUE-TEST03-4" ]
}

変換した結果

out_sample.csv

00000000___test01,TEST01
00000001___test02__test02-A,VALUE-TEST02-A
00000002___test02__test02-B__test02-B-1,VALUE-TEST02-B-1
00000003___test02__test02-B__test02-B-2__test02-B-2-1,VALUE-TEST02-B-2-1
00000004___test02__test02-B__test02-B-2__test02-B-2-2,VALUE-TEST02-B-2-2
00000005___test02__test02-B__test02-B-2__test02-B-2-3,VALUE-TEST02-B-2-3
00000006___test02__test02-C__test02-C-1-1,VALUE-TEST02-C-1-1
00000007___test02__test02-C__test02-C-1-2,VALUE-TEST02-C-1-2
00000008___test02__test02-C__test02-C-2-1,VALUE-TEST02-C-2-1
00000009___test02__test02-C__test02-C-2-2,VALUE-TEST02-C-2-2
00000010___test02__test02-C__test02-C-3-1,VALUE-TEST02-C-3-1
00000011___test02__test02-C__test02-C-3-2,VALUE-TEST02-C-3-2
00000012___test02__test02-C__test02-C-4-1,VALUE-TEST02-C-4-1
00000013___test02__test02-C__test02-C-4-2,VALUE-TEST02-C-4-2
00000014___test02__test02-C__test02-C-5-1,VALUE-TEST02-C-5-1
00000015___test02__test02-C__test02-C-5-2,VALUE-TEST02-C-5-2
00000016___test03,VALUE-TEST03-1
00000017___test03,VALUE-TEST03-2
00000018___test03,VALUE-TEST03-3
00000019___test03,VALUE-TEST03-4

サンプル④-2：JSONファイルをCSV形式に変換してファイル出力（２）

ネストされたJSONファイルをCSV形式に変換しファイル出力します。CSV変換するサンプルは２種類作成しています。

04_2_convertFromJsonToCsv.py

import json
import csv

fileIn = "sample_0.json"
fileOut= "sample_0_2.csv"

json_open = open(fileIn, 'r')
json_load = json.load(json_open)

outDict = {}

def nestLoop(el,argStr):
    global glNo
    headStr = ""
    nextStr = ""
    if type(el) is str: # 文字列
        nowStr = ""
        outElement = "{0}".format(argStr)
        if outElement in outDict:
            nowStr = "{0}__{1}".format(outDict[outElement],el)
        else:
            nowStr = el
        outDict.update( { outElement : nowStr } )
    if type(el) is list: # リスト型
        for lp in el:
            nextStr = "{0}{1}".format(argStr , headStr)
            nestLoop( lp , nextStr)
            nextStr = ""
    if type(el) is dict: # 辞書型
        for lp in el.items():
            if argStr != "":
                headStr = "__"
            nextStr += "{0}{1}{2}".format( argStr , headStr , lp[0].strip() )
            nestLoop( el[ lp[0] ] , nextStr)
            nextStr = ""

nestLoop(json_load,"")

with open(r'{0}'.format(fileOut),'w',encoding='utf-8',newline="") as f:
    writer = csv.writer(f)
    for k, v in outDict.items():
        writer.writerow([k, v])

実行結果

$ python 04_2_convertFromJsonToCsv.py

出力ファイル

test01,TEST01
test02__test02-A,VALUE-TEST02-A
test02__test02-B__test02-B-1,VALUE-TEST02-B-1
test02__test02-B__test02-B-2__test02-B-2-1,VALUE-TEST02-B-2-1
test02__test02-B__test02-B-2__test02-B-2-2,VALUE-TEST02-B-2-2
test02__test02-B__test02-B-2__test02-B-2-3,VALUE-TEST02-B-2-3
test02__test02-C__test02-C-1-1,VALUE-TEST02-C-1-1
test02__test02-C__test02-C-1-2,VALUE-TEST02-C-1-2
test02__test02-C__test02-C-2-1,VALUE-TEST02-C-2-1
test02__test02-C__test02-C-2-2,VALUE-TEST02-C-2-2
test02__test02-C__test02-C-3-1,VALUE-TEST02-C-3-1
test02__test02-C__test02-C-3-2,VALUE-TEST02-C-3-2
test02__test02-C__test02-C-4-1,VALUE-TEST02-C-4-1
test02__test02-C__test02-C-4-2,VALUE-TEST02-C-4-2
test02__test02-C__test02-C-5-1,VALUE-TEST02-C-5-1
test02__test02-C__test02-C-5-2,VALUE-TEST02-C-5-2
test03,VALUE-TEST03-1__VALUE-TEST03-2__VALUE-TEST03-3__VALUE-TEST03-4

サンプル⑤-1：サンプル④-1のCSVファイルを比較する

サンプル④-1のCSVファイルを2つ用意し、それらCSVファイルを比較した結果をCSV形式で出力します。

05_1_compareCsvs.py

import pandas as pd # python -m pip install pandas

fileTgt  = "sample_0_1.csv"
fileBase = "sample_0_1b.csv"
fileOut  = "cmpRes_{0}".format(fileTgt)

dataTgt = pd.read_csv(fileTgt).values.tolist()
dataBase = pd.read_csv(fileBase).values.tolist()

resList = []

for i in dataTgt:
    orgKey = i[0]
    tgtKey = orgKey.split("___")[1]
    tgtVal = i[1]
    tmpResFlg = "NOT-HIT"
    tmpResVal = ""
    for j in dataBase:
        tmpKey = j[0].split("___")[1]
        if tgtKey == tmpKey:
            tmpVal = j[1]
            if tgtVal == tmpVal:
                tmpResFlg = "HIT"
            else:
                if tmpResVal != "":
                    tmpResVal += "__"
                tmpResVal += tmpVal
    if tmpResFlg == "HIT":
        tmpResVal = "ー"
    else:
        if tmpResVal == "":
            tmpResVal = "※該当情報なし※"

    resList.append( [orgKey,tgtVal,tmpResFlg,tmpResVal] )

df = pd.DataFrame(resList)
df.to_csv(fileOut, header=False)

print("Result {0}".format(fileOut))

実行結果

$ python 05_1_compareCsvs.py
Result cmpRes_sample_0_1.csv

出力されたCSVファイル

CSVファイル同士を比較方法
CSVにある2つのカラム JSONの要素と要素の値を比較する
カラムJSONの要素が一致する行が有る
- HIT
カラムJSONの要素が一致する行が無い
- NOT-HIT
- NOT-HITの後に、{{他方のCSVの値}}

cmpRes_sample_0_1.csv

0,00000001___test02__test02-A,VALUE-TEST02-A,HIT,ー
1,00000002___test02__test02-B__test02-B-1,VALUE-TEST02-B-1,NOT-HIT,VALUE-TEST02-B
2,00000003___test02__test02-B__test02-B-2__test02-B-2-1,VALUE-TEST02-B-2-1,HIT,ー
3,00000004___test02__test02-B__test02-B-2__test02-B-2-2,VALUE-TEST02-B-2-2,HIT,ー
4,00000005___test02__test02-B__test02-B-2__test02-B-2-3,VALUE-TEST02-B-2-3,HIT,ー
5,00000006___test02__test02-C__test02-C-1-1,VALUE-TEST02-C-1-1,HIT,ー
6,00000007___test02__test02-C__test02-C-1-2,VALUE-TEST02-C-1-2,NOT-HIT,※該当情報なし※
7,00000008___test02__test02-C__test02-C-2-1,VALUE-TEST02-C-2-1,HIT,ー
8,00000009___test02__test02-C__test02-C-2-2,VALUE-TEST02-C-2-2,HIT,ー
9,00000010___test02__test02-C__test02-C-3-1,VALUE-TEST02-C-3-1,HIT,ー
10,00000011___test02__test02-C__test02-C-3-2,VALUE-TEST02-C-3-2,NOT-HIT,VALUE-TEST02-C-3-AAA
11,00000012___test02__test02-C__test02-C-4-1,VALUE-TEST02-C-4-1,HIT,ー
12,00000013___test02__test02-C__test02-C-4-2,VALUE-TEST02-C-4-2,HIT,ー
13,00000014___test02__test02-C__test02-C-5-1,VALUE-TEST02-C-5-1,HIT,ー
14,00000015___test02__test02-C__test02-C-5-2,VALUE-TEST02-C-5-2,HIT,ー
15,00000016___test03,VALUE-TEST03-1,HIT,ー
16,00000017___test03,VALUE-TEST03-2,NOT-HIT,VALUE-TEST03-1__VALUE-TEST03-HOGE__VALUE-TEST03-3__VALUE-TEST03-PIYO
17,00000019___test03,VALUE-TEST03-4,NOT-HIT,VALUE-TEST03-1__VALUE-TEST03-HOGE__VALUE-TEST03-3__VALUE-TEST03-PIYO

サンプル⑥-1：ファイルから特定キーを含む行を取得する

サンプル⑤-1で出力したCSVファイルから特定キー NOT-HIT を含む行を取得します。

06_1_readCsv.py

import csv

fileOut = "out.csv"

def getLines(argFile, argKey):
    tmpList = []
    with open(fileIn) as f:
        for tmpLine in f:
            tmpLine = tmpLine.replace("\n","")
            if argKey in tmpLine:
                tmpList.append([tmpLine])
    return(tmpList)

listAll = []
fileIn = "all/Tgt/out_vpc.csv"
listAll += getLines(fileIn , "Not-HIT")

with open(fileOut, "w", newline="") as f:
    writer = csv.writer(f)
    for tmpVal in listAll:
        writer.writerow(tmpVal)

サンプル⑥-2：指定フォルダ内のファイルから特定キーを含む行を取得する

指定フォルダ内にあるファイルから特定キー NOT-HIT を含む行を取得します。

06_2_readCsv.py

import csv
import argparse
import os
import glob

help_desc_msg ="""－－－コマンドヘルプ－－－
実行例）
python 07_2_readCsv.py --folderTgt all/Tgt --fileOut out.csv --key Not-HIT
－－－－－－－－－－－－－"""

parser = argparse.ArgumentParser(description=help_desc_msg,formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument('--folderTgt', help='収集対象のファイル群のパス')
parser.add_argument('--fileOut', help='収集結果')
parser.add_argument('--key', help='収集キー')
args = parser.parse_args()

def getLines(argFile, argKey):
    tmpFile   = argFile.replace("\\","/")
    tmpList = []
    with open(argFile) as f:
        for tmpLine in f:
            tmpLine = tmpLine.replace("\n","")
            tmpLine = "{0},{1}".format(tmpFile,tmpLine)
            if argKey in tmpLine:
                tmpList.append(tmpLine)
    return(tmpList)

folderTgt = args.folderTgt
fileOut = args.fileOut
keyVal = args.key
files = glob.glob("{0}/*".format(folderTgt) )
listAll = []

for tmpFile in files:
    listAll += getLines(tmpFile , keyVal)

with open(fileOut, "w", newline="") as f:
    for tmpVal in listAll:
        f.write("{0}\n".format(tmpVal) )

おわりです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up