More than 1 year has passed since last update.

Javaを使用してWordテーブルのテキストと画像を読み取る方法

Last updated at 2022-04-12Posted at 2021-12-08

##Javaを使用してWordテーブルのテキストと画像を読み取る方法

##背景
Wordは一般的にはテキストの多い職場ドキュメントだと認識されていますが、実際テーブルも使用します。時にはWord文書の中にあるテーブルとその文字を読み取りしたい場合はどうしたらいいでしょうか。この記事でJavaを使用してWordテーブルのテキストと画像を読み取る方法を紹介します。以下は具体的な手順です。

##1　プログラム環境の下仕事：

**コードコンパイルツール：**IntelliJ IDEA
**Jdkバージョン：**1.8.0
**テストドキュメント：**Word .docx 2019
Jarパッケージ：free spire.doc.jar 3.9.0

テストドキュメントは以下のようになります：

Jarのインポート手順と方法：
方法１：手動インポート。プロジェクト構造（Shift + Ctrl + Alt + S）インターフェースを開き、「Module」-「Dependencies」を選択し、「＋」をクリックして、「JARs or directories…」を選択し、ローカルパスでjarパッケージを選択し、追加した後、それを確認し、「OK」または「Apply」ボタンでjarをインポートします。

方法２：Mavenリポジトリからインポート。pom.xmlファイルでMavenパスを構成し、無料のspire.doc.jar 3.9.0の依存関係を指定してから、ダウンロードしてインポートする必要があります。具体的な構成は次のとおりです。

com.e-iceblue
https://repo.e-iceblue.com/repository/maven-public/

e-iceblue
spire.Doc
4.9.0

##2　Javaコード

import com.spire.doc.*;
import com.spire.doc.documents.Paragraph;
import com.spire.doc.fields.DocPicture;
import com.spire.doc.interfaces.ITable;

import javax.imageio.ImageIO;
import java.awt.image.RenderedImage;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

    public class GetTable {
        public static void main(String[] args)throws IOException {
            //Wordテストドキュメントをロードする
            Document doc = new Document();
            doc.loadFromFile("C:\\Users\\Administrator\\Desktop\\inputfile.docx");

            //一番目のセクションを取得する
            Section section = doc.getSections().get(0);

            //一番目のフォームを取得する
            ITable table = section.getTables().get(0);

            //txtファイルを作成する（抽出されたテキストをテーブルに書き込むために使用される）
            String output = "ReadTextFromTable.txt";
            File textfile = new File(output);
            if (textfile.exists())
            {
                textfile.delete();
            }
            textfile.createNewFile();
            FileWriter fw = new FileWriter(textfile, true);
            BufferedWriter bw = new BufferedWriter(fw);

            //Listを作成する
            List images = new ArrayList();

            //テーブルの行をトラバースする
            for (int i = 0; i < table.getRows().getCount(); i++)
            {
                TableRow row = table.getRows().get(i);
                //各行のセルをループする
                for (int j = 0; j < row.getCells().getCount(); j++)
                {
                    TableCell cell = row.getCells().get(j);
                    //セル内の段落をループする
                    for (int k = 0; k < cell.getParagraphs().getCount(); k++)
                    {
                        Paragraph paragraph = cell.getParagraphs().get(k);
                        bw.write(paragraph.getText() + "\t");//テキストコンテンツを取得する

                        //段落内のすべてのサブオブジェクトをトラバースする
                        for (int x = 0; x < paragraph.getChildObjects().getCount(); x++)
                        {
                            Object object = paragraph.getChildObjects().get(x);
                            //オブジェクトが画像かどうかを判断する
                            if (object instanceof DocPicture)
                            {
                                //画像を取得する
                                DocPicture picture = (DocPicture) object;
                                images.add(picture.getImage());
                            }
                        }
                    }
                }
                bw.write("\r\n");//txtファイルにコンテンツを書き込む
            }
            bw.flush();
            bw.close();
            fw.close();

            //画像をPNGファイル形式で保存する
            for (int z = 0; z < images.size(); z++)
            {
                File imagefile = new File(String.format("result.png", z));
                ImageIO.write((RenderedImage) images.get(z), "PNG", imagefile);
            }
        }
    }

##3　テキストと画像の読み取りをした結果

コード編集が終了したら、プログラムを実行して表のテキストデータと画像を読み取ります。コード内のファイルパスは、以下のようにIDEAプロジェクトフォルダパスです。
C:\Users\Administrator\IdeaProjects\Table_Doc\ReadTextFromTable.txt
C:\Users\Administrator\IdeaProjects\Table_Doc\result.png
C:\Users\Administrator\IdeaProjects\Table_Doc\inputfile.docx
コードでは、ファイルパスを他のパスにカスタマイズできます。
テキストデータの読み取り結果：

画像の読み取り結果：

とても簡単でしょう、今回の記事はここまで、最後まで読んでいただきありがとうございます。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up