More than 5 years have passed since last update.

PDFドキュメント内から画像を抽出する（iTextSharp を使用）

Last updated at 2016-12-19Posted at 2016-12-19

概要

iTextSharp を使用して、PDFドキュメント内にある画像を抽出するサンプルです。
Visual Studio で NuGet パッケージ(nia_tn1012さん) から 5 系（投稿時の最新版）を取り込み、下記のようなコードで連番の画像ファイル（デスクトップ宛）に抜き出してみました。

※iTextSharp のライセンス形態に注意。

さまざまな方法があるようですが、iTextSharp.text.pdf.parser.PdfReaderContentParser クラスの ProcessContent メソッドを利用するとだいぶ楽にコーディングできるようです。
iTextSharp は、Java 用に作られている iText を .NET 向けに移植したもののようで、Java っぽくリスナークラスを使用するところがいかにもといった感じです。

サンプルコード

using System;
using System.Collections.Generic;
using System.Drawing;
using System.Linq;

namespace SampleNS
{
    public class SampleClass
    {
        public void SampleMethod()
        {
            using (var reader = new iTextSharp.text.pdf.PdfReader(@"sample.pdf"))
            {
                var images = new System.Collections.Generic.List<iTextSharp.text.pdf.parser.PdfImageObject>(); // リスナーからパーサーが返してきた結果を受け取るための入れ物

                var parser = new iTextSharp.text.pdf.parser.PdfReaderContentParser(reader);
                for (int pageNumber = 1; pageNumber < reader.NumberOfPages; pageNumber++)
                {
                    parser.ProcessContent<ImageRenderListener>(pageNumber, new ImageRenderListener(images));
                }
                var index = 1; // 1 からの連番
                images.ForEach(q => q.GetDrawingImage().Save(System.IO.Path.Combine(System.Environment.GetFolderPath(System.Environment.SpecialFolder.DesktopDirectory), (index++).ToString() + "." + q.GetFileType())));
            }
        }
    }

    internal class ImageRenderListener : iTextSharp.text.pdf.parser.IRenderListener
    {
        private System.Collections.Generic.List<iTextSharp.text.pdf.parser.PdfImageObject> _list;

        public ImageRenderListener(System.Collections.Generic.List<iTextSharp.text.pdf.parser.PdfImageObject> list)
        {
            _list = list;
        }

        public void BeginTextBlock()
        {
            // 今回なにもする必要なし
        }

        public void EndTextBlock()
        {
            // 今回なにもする必要なし
        }

        public void RenderImage(iTextSharp.text.pdf.parser.ImageRenderInfo renderInfo)
        {
            var img = renderInfo.GetImage();
            _list.Add(img);
        }

        public void RenderText(iTextSharp.text.pdf.parser.TextRenderInfo renderInfo)
        {
            // 今回なにもする必要なし
        }
    }
}

ご参考

How to extract images from PDF files using c# and itextsharp - psyCodeDeveloper Thank you so much!

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up