More than 1 year has passed since last update.

GCP Document AI機能と実装

Posted at 2023-12-09

GCPのDocument AIの概要

GCP（Google Cloud Platform）のDocument AI（Artificial Intelligence）は、文書処理にAI技術を利用するためのクラウドサービスです。Document AIは、自然言語処理や機械学習などの先進的な技術を活用することで、大量の文書を自動的に処理・分析することができます。主な機能としては、文書のOCR（光学式文字認識）、テキスト抽出、情報の分類、自動要約、機密情報の検出などがあります。

Document AIの機能

OCR（Optical Character Recognition）: Document AIは、画像やスキャンされた文書からテキストを抽出するOCR機能を提供します。複数の言語に対応しており、高い精度で文字を認識できます。
テキスト抽出: Document AIは、文書内の特定の情報を抽出する機能も備えています。例えば、契約書からの特定の項目や注文書からの商品情報などを自動的に抽出することができます。
情報の分類: Document AIは、文書の内容からその種類やカテゴリを判別する能力を持っています。例えば、請求書やレシートなどの文書を自動的に分類して分類情報を抽出することが可能です。
自動要約: Document AIは、長い文書を自動的に要約する機能も提供しています。テキストの要点を抽出し、要約文を生成することができます。
機密情報の検出: Document AIは、文書内に含まれる機密情報（クレジットカード番号や社会保険番号など）を検出する機能も備えています。機密情報の保護や個人情報の取り扱いに役立ちます。

サンプルコード

Java

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.documentai.v1beta3.*;

public class DocumentAIExample {
  public static void main(String[] args) throws Exception {
    // GCPクライアントをセットアップ
    DocumentScannerSettings settings = DocumentScannerSettings.newBuilder().build();
    try (DocumentScannerClient client = DocumentScannerClient.create(settings)) {

      // 文書のOCRを実行
      String gcsInputUri = "gs://my-bucket/my-document.pdf";
      LocationName parent = LocationName.of("my-project", "us");
      InputConfig config =
          InputConfig.newBuilder()
              .setGcsSource(GcsSource.newBuilder().setUri(gcsInputUri).build())
              .setMimeType("application/pdf")
              .build();

      ProcessDocumentRequest request =
          ProcessDocumentRequest.newBuilder()
              .setParent(parent.toString())
              .setInputConfig(config)
              .build();
      OperationFuture<ProcessDocumentResponse, ProcessDocumentMetadata> future =
          client.processDocumentAsync(request);

      // OCR処理の結果を取得
      ProcessDocumentResponse response = future.get();
      Document document = response.getDocument();
      System.out.println("OCR結果:");
      System.out.println(document.toString());
    }
  }
}

Go

package main

import (
	"context"
	"fmt"

	documentai "cloud.google.com/go/documentai/apiv1beta3"
	documentpb "google.golang.org/genproto/googleapis/cloud/documentai/v1beta3"
)

func main() {
	// GCPクライアントをセットアップ
	ctx := context.Background()
	client, err := documentai.NewClient(ctx)
	if err != nil {
		fmt.Printf("Error creating Document AI client: %v", err)
		return
	}
	defer client.Close()

	// 文書のOCRを実行
	gcsInputUri := "gs://my-bucket/my-document.pdf"
	parent := fmt.Sprintf("projects/%s/locations/%s", "my-project", "us")
	request := documentpb.BatchProcessRequest{
		Parent: parent,
		Requests: []*documentpb.ProcessRequest{
			{
				Processor: "projects/-/locations/us/processors/ocropus",
				Inputs: []*documentpb.ProcessRequest_InputConfig{
					{
						Source: &documentpb.ProcessRequest_InputConfig_GcsSource{
							GcsSource: &documentpb.GcsSource{
								Uri: gcsInputUri,
							},
						},
					},
				},
			},
		},
	}
	op, err := client.BatchProcessDocuments(ctx, &request)
	if err != nil {
		fmt.Printf("Error processing document: %v", err)
		return
	}

	// OCR処理の結果を取得
	resp, err := op.Wait(ctx)
	if err != nil {
		fmt.Printf("Error getting document processing response: %v", err)
		return
	}
	for _, result := range resp.GetResponses()[0].GetProcessor().GetOcr().GetPages() {
		for _, line := range result.GetLines() {
			fmt.Println(line.GetDetectedText())
		}
	}
}

C#

using Google.Cloud.DocumentAI.V1Beta3;
using Grpc.Core;
using System;

public class DocumentAIExample
{
    public static void Main(string[] args)
    {
        // GCPクライアントをセットアップ
        DocumentProcessorServiceClientBuilder builder = new DocumentProcessorServiceClientBuilder();
        DocumentProcessorServiceClient client = builder.Build();

        // 文書のOCRを実行
        string gcsInputUri = "gs://my-bucket/my-document.pdf";
        string parent = DocumentProcessorServiceClient.ParentName("[MY_PROJECT]", "[MY_LOCATION]").ToString();
        ProcessRequest request = new ProcessRequest
        {
            Parent = parent,
            Requests = { new ProcessRequest.Types.ProcessRequestBatchAnnotateFilesRequest
            {
                InputConfig = new InputConfig
                {
                    GcsSource = new GcsSource
                    {
                        Uri = gcsInputUri
                    },
                    MimeType = "application/pdf"
                },
                Feature = ProcessRequest.Types.ProcessRequestBatchAnnotateFilesRequest.Types.Feature.SmartReply
            }}
        };
        Operation<ClientAnnotatorConfig, OperationMetadata> operation = client.ProcessDocuments(request);

        // OCR処理の結果を取得
        operation.PollUntilCompleted();
        ProcessResponse response = operation.Result.Response;
        foreach (Processor.Types.Ocr.Page page in response.ProcessResponse.BatchProcessResponse[0]
            .SingleProcessResponse[0].OutputConfig.GcsDestination.GcsUri)
        {
            Console.WriteLine(page);
        }
    }
}

以上、GCPのDocument AIの概要と機能についての詳細な説明と、Java、Go、C#のサンプルコードを提供しました。これらのコードは、Document AIを使用して文書のOCRを実行し、結果を取得する基本的な例です。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up