PythonでWord、Excel、PowerPoint文書を一括でPDFに変換する方法

Posted at 2024-10-21

さまざまな形式のOfficeドキュメント（Word、Excel、PowerPointなど）を扱う際、それらをPDF形式に変換することはよくあるニーズです。この変換は、異なるデバイスやオペレーティングシステム間での一貫した表示を保証するだけでなく、元の内容が簡単に改ざんされないように保護することにも役立ちます。正式なレポート、提案書、資料のアーカイブなどのシーンで非常に適しています。Pythonを使用することで、開発者は簡潔で効率的なスクリプトを作成し、これらのタスクを自動化することができ、企業や個人のドキュメント管理のニーズに応えます。本記事では、Pythonコードを使用してWord、Excel、PowerPointドキュメントをPDFファイルに一括変換する方法と、Pythonを使用してOfficeドキュメントをPDFに結合変換する方法について紹介します。

Word、Excel、PowerPointドキュメントをそれぞれ一括してPDFに変換
Word、Excel、PowerPoint、およびPDFドキュメントを結合して単一のPDFに変換

本記事で紹介する方法では、Spire.Office for Pythonを使用します。PyPIコマンド：pip install spire.office

無料ライセンスの申請

Word、Excel、PowerPointドキュメントをそれぞれ一括してPDFに変換

ファイルの拡張子を判別し、対応するドキュメントをDocumentクラス（Word）、Workbookクラス（Excel）、およびPresentationクラス（PowerPoint）のLoadFromFileメソッドを使用して読み込み、それぞれSaveToFile(string: fileName, FileFormat.PDF)メソッドを用いてPDFとして保存することで、Officeドキュメントを一括でPDFに変換することができます。以下はその具体的な手順です：

必要なモジュールをインポートします。
処理対象のフォルダパスを定義し、指定した種類のファイルを取得して並べ替えます。
PdfDocumentオブジェクトを作成します。
ファイルリストをループし、拡張子に基づいてファイルタイプを判別します。
ファイルタイプに応じて、Document、Workbook、またはPresentationオブジェクトを作成します。
LoadFromFileメソッドを使用してドキュメントを読み込みます。
SaveToFileメソッドを使用して、ドキュメントをPDFに変換し保存します。
リソースを解放します。

コード例

from spire.pdf import PdfDocument
from spire.doc import Document
from spire.xls import Workbook
from spire.presentation import Presentation
from spire.doc import FileFormat as wFileFormat
from spire.xls import FileFormat as eFileFormat
from spire.presentation import FileFormat as pFileFormat
import os

# 処理するフォルダのパスを定義
folderPath = "Documents/"
# 指定された種類のファイルを取得し、並べ替え
extensions = [".doc", ".docx", ".xls", ".xlsx", ".ppt", ".pptx"]
files = sorted([os.path.join(folderPath, f) for f in os.listdir(folderPath) if f.lower().endswith(tuple(extensions))])

# PdfDocumentオブジェクトを作成
pdf = PdfDocument()

# ファイルリストをループ
for file in files:
    extension = os.path.splitext(file)[1].lower()
    if extension in [".doc", ".docx"]:
        # Documentオブジェクトを作成
        doc = Document()
        # Wordドキュメントを読み込む
        doc.LoadFromFile(file)
        # WordドキュメントをPDFに変換
        doc.SaveToFile(f"output/Documents/{os.path.basename(file)}.pdf", wFileFormat.PDF)
        doc.Close()
    if extension in [".xls", ".xlsx"]:
        # Workbookオブジェクトを作成
        workbook = Workbook()
        # Excelファイルを読み込む
        workbook.LoadFromFile(file)
        # ExcelファイルをPDFに変換
        workbook.SaveToFile(f"output/Documents/{os.path.basename(file)}.pdf", eFileFormat.PDF)
        workbook.Dispose()
    if extension in [".ppt", ".pptx"]:
        # Presentationオブジェクトを作成
        presentation = Presentation()
        # PowerPointファイルを読み込む
        presentation.LoadFromFile(file)
        # PowerPointファイルをPDFに変換
        presentation.SaveToFile(f"output/Documents/{os.path.basename(file)}.pdf", pFileFormat.PDF)
        presentation.Dispose()

# PdfDocumentオブジェクトを閉じる
pdf.Close()

結果

Word、Excel、PowerPoint、およびPDFドキュメントを結合して単一のPDFに変換

Officeドキュメントをそれぞれ一括してPDFに変換するだけでなく、さまざまな形式のドキュメントを結合して1つのPDFファイルに変換することも可能です。以下はその手順です：

必要なモジュールをインポートします。
処理対象のフォルダパスを定義し、指定した種類のファイルを取得して並べ替えます。
最終的なPDFドキュメントを保存するために、PdfDocumentオブジェクトpdfを作成します。
新しいPdfDocumentオブジェクトtemPdfと、一時的なPDFファイルの保存先パスを作成します。
ファイルリストをループし、拡張子に基づいてファイルタイプを判別します。
ファイルタイプに応じて、Document、Workbook、またはPresentationオブジェクトを作成し、LoadFromFileメソッドを使用してドキュメントを読み込みます。
SaveToFileメソッドを使用してドキュメントをPDFに変換し、一時的なPDFパスに保存します。
temPdf.LoadFromFile()メソッドで一時PDFを読み込み、pdf.AppendPage(temPdf)でそのページを最終的なPDFに挿入します。
処理が完了したら、pdf.SaveToFile()メソッドで最終PDFファイルを保存します。
一時ファイルを削除し、リソースを解放します。

コード例

from spire.pdf import PdfDocument
from spire.doc import Document
from spire.xls import Workbook
from spire.presentation import Presentation
from spire.doc import FileFormat as wFileFormat
from spire.xls import FileFormat as eFileFormat
from spire.presentation import FileFormat as pFileFormat

import os

# 処理するフォルダのパスを指定
folderPath = 'Documents/'
# 指定された種類のファイルを取得し、並べ替え
extensions = ['.doc', '.docx', '.xls', '.xlsx', '.ppt', '.pptx']
files = sorted([os.path.join(folderPath, f) for f in os.listdir(folderPath) if f.lower().endswith(tuple(extensions))])

# PdfDocumentオブジェクトを作成
pdf = PdfDocument()
# 一時的なPDFとStreamオブジェクトを作成
temPdf = PdfDocument()
temPdfPath = 'temp.pdf'

# ファイルリストをループ
for file in files:
    extension = os.path.splitext(file)[1].lower()

    if extension in ['.doc', '.docx']:
        # Wordドキュメントを読み込む
        doc = Document()
        doc.LoadFromFile(file)
        # 一時的なPDFとして保存
        doc.SaveToFile(temPdfPath, wFileFormat.PDF)
        # 一時的なPDFを読み込み、最終PDFにページを追加
        temPdf.LoadFromFile(temPdfPath)
        pdf.AppendPage(temPdf)
        doc.Close()

    elif extension in ['.xls', '.xlsx']:
        # Excelワークブックを読み込む
        workbook = Workbook()
        workbook.LoadFromFile(file)
        # 一時的なPDFとして保存
        workbook.SaveToFile(temPdfPath, eFileFormat.PDF)
        # 一時的なPDFを読み込み、最終PDFにページを追加
        temPdf.LoadFromFile(temPdfPath)
        pdf.AppendPage(temPdf)
        workbook.Dispose()

    elif extension in ['.ppt', '.pptx']:
        # PowerPointプレゼンテーションを読み込む
        presentation = Presentation()
        presentation.LoadFromFile(file)
        # 一時的なPDFとして保存
        presentation.SaveToFile(temPdfPath, pFileFormat.PDF)
        # 一時的なPDFを読み込み、最終PDFにページを追加
        temPdf.LoadFromFile(temPdfPath)
        pdf.AppendPage(temPdf)
        presentation.Dispose()

    elif extension == '.pdf':
        # 既にPDFの場合は直接読み込み、最終PDFにページを追加
        temPdf.LoadFromFile(file)
        pdf.AppendPage(temPdf)

# 最終的なPDFを保存
outputPath = "output/CombinedPDF.pdf"
pdf.SaveToFile(outputPath)

# 一時ファイルを削除
if os.path.exists('temp.pdf'):
    os.remove('temp.pdf')

# リソースを解放
pdf.Close()
temPdf.Close()

結果

本記事では、Pythonを使用してWord、Excel、PowerPointドキュメントをそれぞれ一括してPDFに変換する方法、およびそれらを結合して単一のPDFに変換する方法を紹介しました。Spire.Office for Pythonは、他にも多くの形式の変換に対応しています。詳しくは公式サイトをご覧ください。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up