More than 3 years have passed since last update.

Excelブックのシート名をExcelで開かずに確認しようとしたらできたっぽい

Last updated at 2021-04-06Posted at 2021-04-06

動機

膨大な数のExcelブックの中から、目的のデータを探そうとしたけどファイル名からは探せなくて、シート名が分かれば目星がつきそうだけど…と、シート名一覧をExcelでいちいちファイルを開かずに見れないかなと試してみました。

対象Excelフォーマット

この方法はファイル拡張子が.xlsx, .xlsmのExcelブックでのみで可能です。(Microsoft Excel 2007+)
ファイル拡張子.xls, .xlsbのExcelブックについては別の方法をとる必要があります。(Excel Binary File Format (.xls) Structure)

やったこと

なんとなく.xlsxファイルをEmacsで開くと、Zip圧縮のファイルで中にworkbook.xmlというファイルがあり、さらにその中身を見ると<sheets><sheet>というタグがあり、その<sheet>タグのname属性がシート名のようでした。
なので、<sheet>タグのname属性を列挙するだけのスクリプトを書いてみました。
後から調べるとちゃんとMicrosoftのページもありますね。こちらは自分で試していませんが。

workbook.xml(抜粋)

    <sheets>
        <sheet name="これは" sheetId="1" r:id="rId1"/>
        <sheet name="テスト" sheetId="2" r:id="rId2"/>
        <sheet name="です" sheetId="3" r:id="rId3"/>
        <sheet name="よ？" sheetId="4" r:id="rId4"/>
    </sheets>

コード

社内で標準みたいな感じなのでpythonで書いてみました。多分python3.xでしか動作しないのではと思います。
スクリプト内のコメントにもある通り、zipファイルをメモリ内で展開して中のファイルを参照する方法については、こちらのサイトを参考にさせていただきました。…というか、関数名や変数名以外は殆どまんま利用させていただいています。

listExcelSheets.py

# !/usr/bin/python3
# -*- coding: utf-8 -*-

# Example usage (on Cygwin):
# $ find ~/OneDrive/Projects/ -type f -name "*.xls[xm]" -exec ./listExcelSheets.py {} \; > sheets.txt
# This script only works with .xlsx or .xlsm files (not with .xlsb or .xls files)

# based on the code from https://srbrnote.work/archives/1297
import zipfile
from collections import OrderedDict
import traceback
def readFileFromZip(aZipFile, aReMatch):
    fileContents = OrderedDict()
    try:
        with zipfile.ZipFile(aZipFile, 'r') as zipData:
            # file list
            files = zipData.infolist()
            for f in files:
                # skip files which does not match the regexp
                if aReMatch(f.filename) is None:
                    continue
                # read file content
                fileData = zipData.read(f.filename)
                # store it in dict
                fileContents[f.filename] = fileData
    except zipfile.BadZipFile:
        print(traceback.format_exc())
    return fileContents

import sys, re
import xml.etree.ElementTree as ET
def listSheets(aExcelBook):
    sheetNames = []
    # xl/workbook.xml contains sheet names
    reMatch = re.compile('^xl/workbook\.xml$').match
    # read xl/workbook.xml body (files as a list, though only one file should be found)
    files = readFileFromZip(aExcelBook, reMatch)
    # parse XML
    root = ET.fromstring(files['xl/workbook.xml'])
    sheets = root.findall('./{*}sheets/{*}sheet',) # use wildcard for namespace
    for sheet in sheets:
        sheetNames.append(sheet.attrib['name'])
    return sheetNames

# main routine starts here..
excelFile = sys.argv[1] # 1st arg as file name
print(excelFile)
for sheet in listSheets(excelFile):
    print("  ", sheet)
print()

使用方法

自分は会社のWindows laptopでCygwinを使用していますので、シェルから下記のように使用しました。
スクリプトに引数としてExcelブックのパスを渡すと、ブックに含まれるシート名を表示します。
Pythonスクリプトなので、自宅のMacでも動作しました。あまりちゃんと考えていませんでしたがUTF-8なターミナルで日本語のシート名も問題なく表示できました。

$ find ~/OneDrive/Projects/ -type f -name "*.xls[xm]" -exec ./listExcelSheets.py {} \; > sheets.txt

実行例 (Cygwin)

$ ./listExcelSheets.py jp_sheet_name.xlsx
jp_sheet_name.xlsx
   これは
   テスト
   です
   よ？

動作環境

動作を確認した環境は以下です。

Windows10 Cygwin (3.2.0(0.340/5/3) 2021-03-29 08:37), Python 3.8.8 (Cygwin)
macOS Big Sur 11.3 Beta (20E5224a), Python 3.9.1

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up