More than 1 year has passed since last update.

Azure Machine Learning ことはじめ（Form Recognizer version 3.0 2022-08-31版）

Posted at 2023-01-04

Form Recognizer ですが、 version 3.0 が GA（General Availability）となりました。

Microsoft Learn や AI-900 の演習サンプルコードでは、現時点では、まだ version 2.1 が紹介されていますが、近いうちに version 3.0 に更新されると思います。

そこで、version 3.0 2022/8/31 版での仕様と API の呼び出し方法を確認しておきたいと思います。

仕様

仕様についての詳細は Form Recognizer 2022-08-31 にあります。

なお、移行方法については、次のページをよく読みます。

予測モデル

version 3.0 で指定できるモデルについての ID と説明です。

Model ID	説明
prebuilt-read	テキスト要素を検出します。
prebuilt-layout	テキスト、レイアウト情報を検出します。
prebuilt-document	テキスト、レイアウト、エンティティ、一般的な key-value ペアを検出します。
prebuilt-businessCard	名詞の項目を検出します。
prebuilt-idDocument	パスポートや ID カードから項目を検出します。
prebuilt-invoice	インボイスから項目を検出します。
prebuilt-receipt	レシートから項目を検出します。
prebuilt-tax.us.w2	アメリカの所得税申請用紙の項目を検出します（2018-2021）。
prebuilt-vaccinationCard	アメリカのコロナワクチンカードの項目を検出します。
prebuilt-healthInsuranceCard.us	アメリカの健康保険カードの項目を検出します。

日本語

日本語は、テキストの要素検出（prebuilt-read）、テキストおよびレイアウト検出（prebuilt-layout）、名刺（prebuilt-businessCard）のモデルが対応しています。

それ以外は、ほぼアメリカの行政用のもののようですので、まずは問題なさそうです。

URL オプション

呼び出しの際の URL の構成とオプションです。version 2.1 から変わっていますので注意します。
ボディ部のプロパティ名も、地味に source => urlSource と変わっていますので、ひっかからないように気を付けます。

https://{endpoint}/formrecognizer/documentModels/{modelId}:analyze?api-version=2022-08-31[&pages][&locale][&stringIndexType]

modelId
- 最新は 2022-08-31 です。
ページ
- PDF が対象の際はここでページを指定することができます。指定の仕方は次のように表現できます。1-3,5,7-9
ロケール
- 日本語なら ja です。それ以外はこちらを参考にします。
stringIndexType
- 引数に「textElements」を指定できるようです。※ここはあんまりよく分かってません。

リクエストヘッダ

Content-Type
- API に送信するファイルの content-type を指定します。
Ocp-Apim-Subscription-Key
- Azure Portal で確認したキーを指定します。

リクエストボディ

対象のデータの送信方法は、URL を指定する方法と、リクエストのボディにエンコードして添付する方法があります。

application/json 以外の場合は、URL エンコードが必要のようです。

Content-Type	ファイルの種類
application/json	公開されているファイルを URL で指定します。
application/pdf	PDF ファイルです。
image/jpeg	JPEGフォーマットの画像ファイルです。
image/png	PNGフォーマットの画像ファイルです。
image/tiff	Tiff フォーマットの画像ファイルです。
image/bmp	BMP フォーマットの画像ファイルです。
text/html	HTML ファイルです。
application/vnd.openxmlformats-officedocument.wordprocessingml.document	Word 2007 以降の docx ファイルです。
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet	Excel2007 以降の xlsx ファイルです。
application/vnd.openxmlformats-officedocument.presentationml.presentation	PowerPoint 2007 以降の pptx ファイルです。
application/octet-stream	上記に属さないファイルの場合に使用します。

コマンド/言語別呼び出し方法

curl

ワンライナーで実行できます。クールです。

curl -v -X POST "https://westus.api.cognitive.microsoft.com/formrecognizer/documentModels/{modelId}:analyze?api-version=2022-08-31?pages={string}&locale={string}&stringIndexType=textElements" \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: {subscription key}" \
--data-ascii "{body}"

Powershell

Microsoft Learn の AI-900 の演習サンプルコードは version 2.1 ですが、これを version 3.0 に書き換えます。

両バージョンでのコードを併記していますので、ちょっと長いです。

form-recognizer.ps1 改

$key="YOUR_KEY"
$endpoint="YOUR_ENDPOINT"

# Create the URL where the raw receipt image can be found
$img = "https://raw.githubusercontent.com/MicrosoftLearning/AI-900-AIFundamentals/main/data/vision/receipt.jpg"

# Create the header for the REST POST with the subscription key
# In this example, the URL of the image will be sent instead of 
# the raw image, so the Content-Type is JSON
$headers = @{}
$headers.Add( "Ocp-Apim-Subscription-Key", $key )
$headers.Add( "Content-Type","application/json" )

# Create the body with the URL of the raw image
# ＜変更ポイント１＞
# version 2.1 では source でしたが、version 3.0 では urlSource に
# 変更されています。
#
# $body = "{'source': '$img'}"    # version 2.1
$body =  "{'urlSource': '$img'}"  # version 3.0

# Call the receipt analyze method with the header and body
# Must call the Invoke-WebRequest to have acces to the header
Write-Host "Sending receipt..."

# ＜変更ポイント１＞
# リクエスト API の URL 構成が変更されています。
# 
# version 2.1 のコード
#$response = Invoke-WebRequest -Method Post `
#          -Uri "$endpoint/formrecognizer/v2.1/prebuilt/receipt/analyze" `
#          -Headers $headers `
#          -Body $body
#
# version 3.0 のコード
$response = Invoke-WebRequest -Method Post `
          -Uri "$endpoint/formrecognizer/documentModels/prebuilt-receipt:analyze?api-version=2022-08-31" `
          -Headers $headers `
          -Body $body
Write-Host "...Receipt sent."

# Extract the URL from the response of the receipt anaylzer 
# to call the API to getting the analysis results
# version 2.1
#$resultUrl = $($response.Headers['Operation-Location'])
# version 3.0 からは小文字になったようです
$resultUrl = $($response.Headers['operation-location'])

# Create the header for the REST GET with only the subscription key
$resultHeaders = @{}
$resultHeaders.Add( "Ocp-Apim-Subscription-Key", $key )

# Get the receipt analysis results, passing in the resultURL
# Continue to request results until the analysis is "succeeded"


Write-Host "Getting results..."
#Do {
#    $result = Invoke-RestMethod -Method Get `
#            -Uri $resultUrl `
#            -Headers $resultHeaders | ConvertTo-Json -Depth 10
#
#    $analysis = ($result | ConvertFrom-Json)
#} while ($analysis.status -ne "succeeded")
Do {
    #
    # 3秒待つようにします。
    # 無料の F0 の場合に API の呼び出し過ぎを防ぎます
    # 
    Start-Sleep -s 3
    $result = Invoke-RestMethod -Method Get `
        -Uri $resultUrl `
        -Headers $resultHeaders | ConvertTo-Json -Depth 10
    $analysis = ($result | ConvertFrom-Json)
} while ($analysis.status -ne "succeeded")

# Access the relevant fields from the analysis 
$analysisFields = $analysis.analyzeResult.documentResults.fields

# version 2.1 のコード
# Print out all of the properties of the receipt analysis
# Write-Host ("Receipt Type: ", $($analysisFields.ReceiptType.valueString))
# Write-Host ("Merchant Address: ", $($analysisFields.MerchantAddress.text))
# Write-Host ("Merchant Phone: ", $($analysisFields.MerchantPhoneNumber.text))
# Write-Host ("Transaction Date: ", $($analysisFields.TransactionDate.valueDate))
# version 3.0 からは値は content に入っているようです
Write-Host ("Receipt Type: ", $($analysisFields.ReceiptType.content))
Write-Host ("Merchant Address: ", $($analysisFields.MerchantAddress.content))
Write-Host ("Merchant Phone: ", $($analysisFields.MerchantPhoneNumber.content))
Write-Host ("Transaction Date: ", $($analysisFields.TransactionDate.content))

Write-Host ("Receipt Items: ")

# Access the individual items from the analysis
$receiptItems = $($analysisFields.Items.valueArray)

# version 2.1 のコード
# for (($idx = 0); $idx -lt $receiptItems.Length; $idx++) {
#     $item = $receiptItems[$idx] 
#     Write-Host ("Item #", ($idx+1))
#     Write-Host ("  - Name: ", $($item.valueObject.Name.valueString))
#    Write-Host ("  - Price: ",$($item.valueObject.TotalPrice.valueNumber))
# }
# version 3.0 のコード
# 変数名が変わっています
for (($idx = 0); $idx -lt $receiptItems.Length; $idx++) {
    $item = $receiptItems[$idx] 
    Write-Host ("Item #", ($idx+1))
    Write-Host ("  - Name: ", $($item.content))
    Write-Host ("  - Price: ",$($item.valueObject.TotalPrice.valueNumber))
}

# version 2.1 のコード
# Write-Host ("Subtotal: ", $($analysisFields.Subtotal.text))
# Write-Host ("Tax: ", $($analysisFields.Tax.text))
# Write-Host ("Total: ", $($analysisFields.Total.text))
# version 3.0 のコード
# 変数名が変わっています
Write-Host ("Subtotal: ", $($analysisFields.Subtotal.valueNumber))
Write-Host ("Tax: ", $($analysisFields.TotalTax.valueNumber))
Write-Host ("Total: ", $($analysisFields.Total.valueNumber))

c#

どうしてもコードが長くなってしまいます。仕方がないですね。

using System;
using System.Net.Http.Headers;
using System.Text;
using System.Net.Http;
using System.Web;

namespace CSHttpClientSample
{
  static class Program
  {
    static void Main()
    {
      MakeRequest();
      Console.WriteLine("Hit ENTER to exit...");
      Console.ReadLine();
    }
        
    static async void MakeRequest()
    {
      var client = new HttpClient();
      var queryString = HttpUtility.ParseQueryString(string.Empty);

      // Request headers
      client.DefaultRequestHeaders.Add(
        "Ocp-Apim-Subscription-Key",
        "{subscription key}");

      // Request parameters
      queryString["pages"] = "{string}";
      queryString["locale"] = "{string}";
      queryString["stringIndexType"] = "textElements";
      var uri = "https://westus.api.cognitive.microsoft.com/" +
                "formrecognizer/documentModels/{modelId}:analyze" +
                "?api-version=2022-08-31&" + queryString;

      HttpResponseMessage response;

      // Request body
      byte[] byteData = Encoding.UTF8.GetBytes("{body}");

      using (var content = new ByteArrayContent(byteData))
      {
        content.Headers.ContentType = new MediaTypeHeaderValue(
          "< your content type, i.e. application/json >");
        response = await client.PostAsync(uri, content);
      }
    }
  }
}