PythonによるExcelファイルソートスクリプト

Last updated at 2025-01-31Posted at 2025-01-31

概要

　このスクリプトは、bashスクリプトの中でPythonを実行し、EXCELファイルを読み込んで先頭列をキーにしてソートする。
　一行目はラベル。セルの中には式も含まれている。コピーされた式の式中の行番号は今あるところの行番号にする。書式も含めてソートする。

注意

　文字色・背景色・罫線についてソートがうまくできなかった。改善法についてご教示ください。

スクリプト

#!/bin/bash

FILE1=$1

# Bashスクリプトの内容
python3 <<EOF
import openpyxl
from openpyxl.utils import get_column_letter
from openpyxl.styles import NamedStyle
from copy import copy
import re
import os

def print_write_progress(row_index, col_index, value):
    """書き込み進行状況を出力"""
    print(f"書き込み中 - 行: {row_index}, 列: {col_index}, 値: {value}")

def copy_cell_styles(source_cell, target_cell):
    """セルのスタイルをコピー"""
    if source_cell.has_style:
        target_cell.font = copy(source_cell.font)
        target_cell.border = copy(source_cell.border)
        target_cell.fill = copy(source_cell.fill)
        target_cell.number_format = source_cell.number_format
        target_cell.protection = copy(source_cell.protection)
        target_cell.alignment = copy(source_cell.alignment)

    else:
        target_cell.font = openpyxl.styles.Font()
        target_cell.border = openpyxl.styles.Border()
        target_cell.fill = openpyxl.styles.PatternFill()
        target_cell.number_format = 'General'
        target_cell.protection = openpyxl.styles.Protection()
        target_cell.alignment = openpyxl.styles.Alignment()

def update_formula(formula, row_shift):
    """式中の行番号を更新する"""
    if not isinstance(formula, str) or not formula.startswith('='):
        return formula

    # 行番号を正規表現で抽出して更新
    def replace_row(match):
        col = match.group(1)
        old_row = int(match.group(2))
        new_row = old_row + row_shift
        return f"{col}{new_row}"

    pattern = r'([A-Z]+)(\d+)'
    new_formula = re.sub(pattern, replace_row, formula)
    return new_formula

def sort_excel(excel_file):
    try:
        # Excelファイルを開く
        _, file_extension = os.path.splitext(excel_file)
        keep_vba = file_extension.lower() == '.xlsm'

        wb = openpyxl.load_workbook(excel_file, keep_vba=keep_vba)
        ws = wb.active

        # データとスタイルを取得
        data = []
        for row_index, row in enumerate(ws.iter_rows(min_row=2, values_only=False), start=2):  # ヘッダーを除く
            row_values = [cell.value for cell in row]
            row_styles = [cell for cell in row]  # スタイルを保持
            data.append((row_index, row_values[0], row_values, row_styles))

        # ソート
        data.sort(key=lambda x: (x[1] is None, str(x[1]) if x[1] is not None else ''))

        # 書き込み
        for new_row_idx, (old_row_idx, _, row_values, row_styles) in enumerate(data, start=2):
            row_shift = new_row_idx - old_row_idx  # 正しいシフト量を計算
            for col_idx, (value, style) in enumerate(zip(row_values, row_styles), start=1):
                target_cell = ws.cell(row=new_row_idx, column=col_idx)
                copy_cell_styles(style, target_cell)

                # 式の場合、行番号を更新
                if isinstance(value, str) and value.startswith('='):
                    value = update_formula(value, row_shift)

                target_cell.value = value

                if col_idx == 1:
                    print_write_progress(new_row_idx, col_idx, target_cell.value)

        # ファイルを保存
        wb.save(excel_file)
        print(f"ファイルをソートしました: {excel_file}")

    except FileNotFoundError:
        print(f"ファイルが見つかりません: {excel_file}")
    except Exception as e:
        print(f"エラーが発生しました: {e}")

if __name__ == "__main__":
    excel_file = "${FILE1}"

    # 関数を呼び出して更新処理を実行
    sort_excel(excel_file)

EOF

解説

Excelファイルの読み込み

        # Excelファイルを開く
        _, file_extension = os.path.splitext(excel_file)
        keep_vba = file_extension.lower() == '.xlsm'

        wb = openpyxl.load_workbook(excel_file, keep_vba=keep_vba)
        ws = wb.active

　Excelファイルのワークブック全体を読み込む。
　keep_vbaはExcelファイルのVBA（Visual Basic for Applications）マクロを保持するためのオプション。

データの抽出

        # データとスタイルを取得
        data = []
        for row_index, row in enumerate(ws.iter_rows(min_row=2, values_only=False), start=2):  # ヘッダーを除く
            row_values = [cell.value for cell in row]
            row_styles = [cell for cell in row]  # スタイルを保持
            data.append((row_index, row_values[0], row_values, row_styles))

・データの初期化:　dataという空のリストを作成。
・行のイテレーション:
　ws.iter_rows(min_row=2, values_only=False) を使用して、2行目からシートの最後の行までをイテレートする。min_row=2 はヘッダー行（1行目）をスキップするためのもの。values_only=False は、セルの値だけでなくセル自体（スタイルを含む）を取得するためのオプション。
　enumerate 関数で各行のインデックス（row_index）を取得し、start=2 によりインデックスが2から始まるようにする。
・各行のデータとスタイルの抽出:
　row_values = [cell.value for cell in row]: 各セルの値をリストにする。
　row_styles = [cell for cell in row]: 各セルのオブジェクト自体（スタイル情報を含む）をリストにする。これによりスタイル情報を保持する。
・データのリストへの追加:
　data.append((row_index, row_values[0], row_values, row_styles)): 各行のデータをタプルとしてdataリストに追加する。このタプルには以下の情報が含まれる：
　　row_index: 元の行番号
　　row_values[0]: ソートのキーとなる最初のセルの値。これによりソートが可能になる。
　　row_values: 行全体の値のリスト
　　row_styles: 行全体のセルのスタイル情報のリスト

ソート

        # ソート
        data.sort(key=lambda x: (x[1] is None, str(x[1]) if x[1] is not None else ''))

　dataリスト内の要素を、各タプルの2番目の要素（x[1]）を基準にしてソート。リストを直接変更し、元のリストを新しいソートされたリストに置き換える。

(x[1] is None, str(x[1]) if x[1] is not None else ''): この部分がソートのキー。
　x[1] is None: 最初のソート基準。x[1]がNoneかどうかをチェック。Noneは他のどんな値よりも「大きい」として扱うことで、Noneを含む項目はリストの最後に移動する。
　str(x[1]) if x[1] is not None else '': x[1]がNoneでない場合、文字列に変換し、Noneの場合は空文字列を返す。これにより、None以外の値は文字列として比較される。

書き込み

        # 書き込み
        for new_row_idx, (old_row_idx, _, row_values, row_styles) in enumerate(data, start=2):
            row_shift = new_row_idx - old_row_idx  # 正しいシフト量を計算
            for col_idx, (value, style) in enumerate(zip(row_values, row_styles), start=1):
                target_cell = ws.cell(row=new_row_idx, column=col_idx)
                copy_cell_styles(style, target_cell)

                # 式の場合、行番号を更新
                if isinstance(value, str) and value.startswith('='):
                    value = update_formula(value, row_shift)

                target_cell.value = value

・enumerate(data, start=2)を使って、dataリストをループ。start=2は、Excelの行番号が1から始まるため、インデックスを2から始める。
・row_shift = new_row_idx - old_row_idx　：ソート前後の行番号の差を計算。
・for col_idx, (value, style) in enumerate(zip(row_values, row_styles), start=1):　列ごとの処理。
　zip(row_values, row_styles)を使って、各行の値とスタイルを同時にループする。col_idxは列番号で、1から始まる。
・target_cell = ws.cell(row=new_row_idx, column=col_idx)　：ソート後の位置に新しいセルを指定。
・copy_cell_styles(style, target_cell)　：元のセルのスタイル（フォント、罫線、塗りつぶしなど）を新しいセルにコピー。
・if isinstance(value, str) and value.startswith('='):
value = update_formula(value, row_shift)
　値が文字列で、かつ=で始まる場合（つまり、セルに式がある場合）、update_formula関数を使って式中の行番号を更新する。row_shiftを使って、式が指す行番号を新しい位置に合わせる。
・target_cell.value = value　：値（更新された場合も含めて）をセルに設定。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up