Azure AI Search でのドキュメント検索のデバッグ機能

Posted at 2024-11-26

Azure AI Search でのデバッグ機能と言えば、インデクサーとスキルセットの挙動をデバッグする機能がデバッグセッションという名前で提供されていますが、ドキュメント検索をした際に、セマンティック検索がどのような挙動をしたか？や、2024年11月にパブリックプレビューとして公開されたクエリリライト(書き換え)機能がどのように挙動したか？を調べるためのデバッグ機能も提供されています。

例えば、クエリリライト機能を有効ににしてその機能が動いたかを確認するには、検索リクエストを以下の通りに送信します。

POST https://[Azure AI Search アカウント名].search.windows.net/indexes/[インデックス名]/docs/search?api-version=2024-11-01-preview
{
    "search": "[検索クエリ]",
    "queryType":"semantic",
    "queryRewrites":"generative|count-5",
    "queryLanguage":"ja-JP",
    "debug":"queryRewrites"
}

ここでのミソは、要求パラメータの queryType queryRewrites queryLanguage に所定のパラメータを設定してクエリリライト機能を有効にして、かつパラメータ debug に queryRewrites を指定することです。

これで検索を実行すると、以下の様なレスポンスが返ってきます。

{
  ...
  "@search.debug": {
    "queryRewrites": {
      "text": {
        "inputQuery": "日焼け止め",
        "rewrites": [
          "日焼け止めおすすめ",
          "日焼け止めの効果",
          "日焼け止め ランキング",
          "日焼け止め おすすめ",
          "日焼け止めの選び方"
        ]
      },
    }
  },
  "value": [...],
  ...
}

@search.debug の中の queryRewrites にクエリリライトで生成された書き換え後のクエリなど、機能の挙動結果が格納されています。

この debug に何が指定できるかというと、REST API バージョン2024-11-01-preview だと以下のパラメータが指定できるようです。

disabled
all
semantic
vector
queryRewrites

disableはデバッグ機能を使用しないで既定値です。allは全デバッグ機能を有効にする、ですね。他の機能の詳細は以下の通りです。

ちなみに、上記以外の値を指定して検索を実行すると、以下の様なエラーメッセージが返されます。

Please specify one of the following options: disabled, all, speller, semantic, vector, queryrewrites.

このメッセージでは speller が指定可能に見えますが、指定してもエラーにはなりませんが何も起きません。おそらく、クエリのスペルチェック機能のデバッグ用に用意されていたのかもしれませんね (クエリスペルチェック機能は日本語非対応、かつクエリ書き換え機能で置き換えできてしまうので、日本では使う機会は少なそうです)

`semantic` を有効にした場合

セマンティックリランカーがどのように作用しているかを確認することができます。

検索結果が以下の様に各検索ドキュメントのフィールドに@search.documentDebugInfoが追加され、どのフィールドがセマンティックリランカーに使われたかが分かるようになります。

ここで、インデックス定義で指定されているセマンティック検索用のフィールドはtitle, description, contentArea であり、それぞれタイトル、コンテント、キーワードフィールドとして指定しています。セマンティックリランカーでは、これらのフィールドの値から SLM (Small Language Model) で要約を作り、検索クエリに関連した内容かどうかを SML で判断することでリランク処理を行っています。

ここではデバッグ機能で各フィールドがちゃんと使われて要約作成→リランク処理が行われているかどうかを確認することができるわけですね。

{
    ...
    "value": [
    {
        "@search.documentDebugInfo": {
            "semantic": {
              "titleField": {
                "name": "title",
                "state": "used"
              },
              "contentFields": [
                {
                  "name": "description",
                  "state": "used"
                }
              ],
              "keywordFields": [
                {
                  "name": "contentArea",
                  "state": "unused"
                }
              ],
              "rerankerInput": {
                "title": "Unlock Retrieval Augmented Generation for Business Applications",
                "content": "Cohere delivers a comprehensive suite of Foundational AI models including Command R/R+, Embed 3 and Rerank with multilingual capabilities and state of the art performance, available on Azure AI. Explore the transformative capabilities of all of Cohere’s robust models that support many different use cases including leading RAG architecture to enhance your Enterprise search. Learn how businesses can build intelligent systems that bridge the gap between textual and visual data.",
                "keywords": ""
              }
            }
          },
          "sessionId": "c12bed43-8253-47f6-9dc1-f86ca3638768",
          "sessionCode": "BRKDMFP422",
          "title": "Unlock Retrieval Augmented Generation for Business Applications",
          "description": "Cohere delivers a comprehensive suite of Foundational AI models including Command R/R+, Embed 3 and Rerank with multilingual capabilities and state of the art performance, available on Azure AI. Explore the transformative capabilities of all of Cohere’s robust models that support many different use cases including leading RAG architecture to enhance your Enterprise search. Learn how businesses can build intelligent systems that bridge the gap between textual and visual data.",
          "contentArea": [],
          ...
      }, ... ],
      ...
}

`vector` を有効にした場合

ランク融合(RRF)前のサブスコアを確認することができます。

複数フィールドに対してベクトル検索を行った場合や、ハイブリッド検索(フルテキスト検索×ベクトル検索)を行った際に、ランク融合(RRF)が行われるのですが、このデバッグ機能では、融合後の最終検索スコアだけでなく、融合前の各フィールドの検索スコア(サブスコア)が取得できます。

具体的には、以下の様な検索結果です。

{
  ...
  "value": [
    {
      "@search.score": 0.05000000447034836,
      "@search.documentDebugInfo": {
        "vectors": {
          "subscores": {
            "documentBoost": 1.0,
            "text": {
              "searchScore": 1.2053436040878296
            },
            "vectors": [
              {
                "descriptionVector": {
                  "searchScore": 0.6362918019294739,
                  "vectorSimilarity": 0.4283940214731239
                },
                "titleVector": {
                  "searchScore": 0.6324651837348938,
                  "vectorSimilarity": 0.41888529880063197
                }
              }
            ]
          }
        }
      },
      "sessionCode": "BRK150",
      "title": "Modernize enterprise integration with Azure Integration Services",
      "description": "Modernizing enterprise integration is key to staying competitive. Join us to explore Azure Integration Services and its potential. Discover how Azure Logic Apps can enhance processes with AI, enabling automation and smarter decision-making. Learn about hybrid deployment models to connect on-premises systems with cloud. Finally, explore transitioning from BizTalk to Azure Integration Services, leveraging your existing investments while adopting a cloud-native approach.",
      ...
    },
    ...
  ]
}

これはフルテキスト検索と2つのフィールドに対するベクトル検索(titleVector と descriptionVector)対するハイブリッド検索の結果です。

各検索ドキュメントの @search.documentDebugInfo の vectors 内に、以下の3つのサブスコアが確認できます。

フルテキスト検索のサブスコア(text.searchScore)
titleVectorに対するベクトル検索のサブスコア(vectors内のtitleVector.searchScore)
descriptionVectorに対するベクトル検索のサブスコア(vectors内のdescriptionVector.searchScore)

フルテキスト検索(0~∞)とベクトル検索(0.3~1.0)は検索スコアの上限が異なるため、その検索結果を融合するためにランク融合(RRF)が使用されています。また、複数フィールドに対するベクトル検索もRRFで融合されます。ここでは、融合前の検索スコア(サブスコア)が確認できるわけですね。

この機能を使えば、検索精度がイマイチの場合に、どの検索方式が足を引っ張っているかが確認できますね！

参考: ハイブリッド検索のスコアリング (RRF) - Azure AI Search | Microsoft Learn

`queryRewrites` を有効にした場合

上にも書いた通りですが、2024年11月にパブリックプレビューとして公開されたクエリリライト(書き換え)機能がどのように挙動したか？を調べるための機能で、機能が生成したクエリを出力することができます。

{
  ...
  "@search.debug": {
    "queryRewrites": {
      "text": {
        "inputQuery": "日焼け止め",
        "rewrites": [
          "日焼け止めおすすめ",
          "日焼け止めの効果",
          "日焼け止め ランキング",
          "日焼け止め おすすめ",
          "日焼け止めの選び方"
        ]
      },
    }
  },
  "value": [...]
}

日焼け止め で検索をする際、単純なキーワードだけでなく、日焼け止めの選び方や日焼け止めの効果といった用途付きでも検索するようになります。書き換えられたクエリは、フルテキスト検索だけでなくベクトル検索にも適用されます。用途(背景情報)付きクエリになるので、ベクトル検索やセマンティックランカーに上手く作用しそうな機能ですね。

クエリリライトは、検索クエリが口語体の質問文だった場合は、フルテキスト検索にも優しい形にリライトされるようです。便利！

"queryRewrites": {
"text": {
  "inputQuery": "水で落ちないような強い日焼け止めが欲しいです。何かお勧めはありますか？",
  "rewrites": [
    "強い日焼け止めおすすめ",
    "強烈な日焼け止めおすすめ",
    "おすすめの強い日焼け止め",
    "水で落ちない日焼け止めおすすめ",
    "日焼け止めおすすめ"
  ]
}

ちなみにこのデバッグ機能については、公式ドキュメントに「パフォーマンスを向上させるため、運用環境ではこのプロパティを設定しないでください。」と書いてありますので、開発時以外は有効にしない方が良いです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Azure AI Search でのドキュメント検索のデバッグ機能

semantic を有効にした場合

vector を有効にした場合

queryRewrites を有効にした場合

`semantic` を有効にした場合

`vector` を有効にした場合

`queryRewrites` を有効にした場合