More than 5 years have passed since last update.

PowerShellでPDFをページごとに分割 (w/iTextSharp)

Last updated at 2020-04-06Posted at 2020-03-18

PowerShellからiTextSharpを呼び出してPDFをページごとに分割します。

下準備

まず始めに itextsharp.dll を入手してください。

nugetするのが本筋ですが、スクリプトを置くフォルダにlibというサブフォルダを作って、その下に itextsharp.dll を入れる方式です。
よって、下記スクリプトもこの前提です。

スクリプト

以下を Split-Pages.ps1 として保存。

Split-Pages.ps1

# Split PDF pages
param(
    [parameter(mandatory=$true)][string]$sourceDataPath,
    [string]$destinationPath
)

function Split-Pages($sourceDataPath, $destinationPath)
{
    # path of itextsharp.dll
    [System.Reflection.Assembly]::LoadFrom((Join-Path (Split-Path $script:MyInvocation.MyCommand.Path) "\lib\itextsharp.dll")) | Out-Null

    $pr = New-Object iTextSharp.text.pdf.PdfReader([string]$sourceDataPath)
    if($pr.IsEncrypted()){
        Write-Error("Encrypted: ", $sourceDataPath)
    }

    if($null -eq $destinationPath){
        $path = Split-Path -Parent $sourceDataPath
    }
    else{
        $path = Convert-Path $destinationPath
    }

    $fname = [System.IO.Path]::GetFileNameWithoutExtension($sourceDataPath)
    $ext = [System.IO.Path]::GetExtension($sourceDataPath)

    try {
        $pages = $pr.NumberOfPages
        $padding = [string]$pages

        for($page = 1; $page -le $pages; $page++) {

            $pn = [string]$page

            $destinationPath = Join-Path $path ($fname + "_p" + $pn.PadLeft($padding.Length, "0") + $ext)

            $fs = New-Object System.IO.FileStream([string]$destinationPath, [System.IO.FileMode]::Create)

            $doc = New-Object iTextSharp.text.Document($pr.GetPageSize($page))
            $pw = [iTextSharp.text.pdf.PdfWriter]::GetInstance($doc, $fs)

            $doc.Open()
            $doc.NewPage() | Out-Null

            $importedPage = $pw.GetImportedpage($pr, $page)
            $pcb = $pw.DirectContent
            $pcb.AddTemplate($importedPage, 0, 0)

            $doc.Close()
            $pw.Close()
            $fs.Close()

        }

        $pr.Close()
    }
    catch {
        Write-Error("Error: " + $_.Exception)
    }
}

Split-Pages (Convert-Path $sourceDataPath) $destinationPath

使用例

PowerShellを立ち上げて、

PS> Split-Pages.ps1 元PDFファイル 出力先フォルダ

みたいに使います。
出力先フォルダを指定しない場合は元PDFファイルと同じフォルダに分割したPDFを保存します。

iTextSharpは使える子ですが、iTextAsian.dllを組み合わせても「半角英数含めてUniJIS-UCS2-HW-Hでエンコードされていると半角英数を取り損ねる」のはなんとかならないものか…。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up