gz形式で圧縮されたjsonファイルをPythonスクリプトで手軽に一括解凍する

Posted at 2024-07-28

はじめに

本記事ではgz形式で保存されたjsonファイルをpythonを使用し解凍する処理を実装します。
手動で１つ１つ解凍するのは面倒なのでスクリプトで一括したいです。

環境

Python 3.12.4

前提

本記事では下記のようなフォルダ構成とgzファイル名を想定しています。

root
│  gz_to_json.py
│  json_to_gz.py
│
├─extracted_files（解凍後のJSONを格納するフォルダ）
└─gz_files（gz形式のファイルを格納するフォルダ）
    file_0.json.gz
    file_1.json.gz
    file_2.json.gz
    ～～～

gzからJSON形式に解凍するソース

下記のスクリプトでgz形式のファイルをJSON形式に解凍します。
詳細はコメントアウトの通りです。

gz_to_json.py

import gzip
import os

def decompress_gz_files(input_directory, output_directory):
     # 出力ディレクトリが存在しない場合は作成
    if not os.path.exists(output_directory):
        os.makedirs(output_directory)
    
    for filename in os.listdir(input_directory):
        # .gzのファイルのみ処理
        if filename.endswith('.gz'):

            # .gzファイルを読み込み
            file_path = os.path.join(input_directory, filename)
            with gzip.open(file_path, 'rt', encoding='utf-8') as gz_file:
                json_data = gz_file.read()
            
            # 出力ファイルの名前、パスを作成
            output_filename = os.path.join(output_directory, filename[:-3])

            # 内容をJSONファイルに書き込み
            with open(output_filename, 'w', encoding='utf-8') as json_file:
                json_file.write(json_data)
            print(f"extracted {file_path} to {output_filename}")

# .gzファイルを解凍
decompress_gz_files('gz_files', 'extracted_files')

実行結果

下記のコマンドでスクリプトを実行します。

python .\gz_to_json.py

実行すると次のようにgz_filesフォルダ直下にあった5つのgz形式のファイルがJSONに変換させれたことを確認できました。

実行結果

extracted gz_files\file_0.json.gz to extracted_files\file_0.json
extracted gz_files\file_1.json.gz to extracted_files\file_1.json
extracted gz_files\file_2.json.gz to extracted_files\file_2.json
extracted gz_files\file_3.json.gz to extracted_files\file_3.json
extracted gz_files\file_4.json.gz to extracted_files\file_4.json

おまけ

本記事においてサンプルのためのJSON、並びにgzファイルを作成するために使用したソースはそれぞれ以下の通りです。

①： JSONサンプル作成のために使用したソース

create_json.py

import json
import os
import random

def create_json_files(directory, num_files):
    for i in range(num_files):
        # 小数第一位までの乱数を生成
        random_factor = round(random.uniform(0.3, 0.7), 1)
        
        # 辞書形式のデータを作成
        data = {"id": i, "name": f"サンプル {i}", "value": round(i * 10 * random_factor, 1)}
        
        # JSON形式変換
        json_data = json.dumps(data, ensure_ascii=False, indent=4)
        
        # JSONファイルのファイルパスを作成（引数の文字列を結合し1つのパスを作成）
        filename = os.path.join(directory, f"file_{i}.json")
        
        # JSONファイルを指定されたディレクトリに作成
        with open(filename, 'w', encoding='utf-8') as json_file:
            json_file.write(json_data)
        
        # ファイルが正常に作成されたことを出力
        print(f"Created {filename}")

# ディレクトリが存在しない場合は作成
if not os.path.exists('json_files'):
    os.makedirs('json_files')

# JSONファイルを5つ作成
create_json_files('json_files', 5)

②： ①で作成したJSONをgz化するためのソース

json_to_gz.py

import gzip
import os

def compress_json_files(input_directory, output_directory):
    # 出力ディレクトリが存在しない場合は作成
    if not os.path.exists(output_directory):
        os.makedirs(output_directory)
    
    for filename in os.listdir(input_directory):
        # JSONファイルのみ処理
        if filename.endswith('.json'):
            file_path = os.path.join(input_directory, filename)

            # JSONファイルを読み込み
            with open(file_path, 'rt', encoding='utf-8') as json_file:
                json_data = json_file.read()
            
            # 出力ファイルの名前、パスを作成
            output_filename = os.path.join(output_directory, f"{filename}.gz")

            # gz形式でファイルを書き込み
            with gzip.open(output_filename, 'wt', encoding='utf-8') as gz_file:
                gz_file.write(json_data)
            print(f"Compressed {file_path} to {output_filename}")

# JSONファイルをgz形式に圧縮
compress_json_files('json_files', 'gz_files')

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up