More than 1 year has passed since last update.

【AzureML プロンプトフロー】Q&A with Your Own Data Using Faiss Index

Posted at 2023-09-28

はじめに

今回はAzureMLのプロンプトフローを試してみます。
サンプルフローはいくつかありますが、その中でも「Q&A with Your Own Data Using Faiss Index」を使います。

QnA with Your Own Data Using Faiss Index

このプロンプトフローでは、GPT3.5を使用したQ&Aを利用できます。ユーザーの質問からドキュメントに沿った回答ができます。

まず、プロンプトフローを「作成」して、「QnA with Your Own Data Using Faiss Index」を選択して始めます。

Azure OpenAIの接続

はじめにプロンプトフローではLLM接続が必要となってくるため、すでにデプロイしたAzure OpenAIのモデルに接続します。

「接続」タブの「作成」から下記の必要な項目を入力してください。

Runtimeの作成

続いてプロンプトフローを実行するにあたり、Runtimeの作成を行う必要があります。

「ランタイム」タブをクリックし、「作成」をすると「マネージドオンラインエンドポイントデプロイ」と「コンピューティングインスタンス」のどちらかを選択できるようになります。

今回は推奨されている「コンピューティングインスタンス」でRuntimeを作成します。

「AzureMLコンピューティングインスタンスを作成する」をクリックし、

「コンピューティング名」や仮想マシンのサイズを選びます。

数分待てば、コンピューティングインスタンスが作成されるので、完了したら「ランタイム名」を入力し、Runtimeを作成します。

処理と流れ

QnA with Your Own Data Using Faiss Indexのサンプルを作成すると、次のような画面になり、「入力」と「出力」でユーザーの質問を記載できるようになっています。

デフォルトの「入力」では、「What's AML SDK V2? Should I use V1 or V2?」という質問文がすでにあるので、今回はこれを試して出力結果を見てみます。

処理の流れはこのようになっています。

1つずつ流れや入力・出力を見ていきます。

・generate_embedding
文書や情報をベクトル形式に変換するための埋め込みを生成します。

LLM接続が必要なのでデプロイしているAzure OpenAIのデプロイやモデルを入力します。

このときの入力は下記のようになります。

入力

{
  "connection":"XXX"
  "deployment_name":"text-embedding-ada-002"
  "model":undefined
  "input":"What's AML SDK V2? Should I use V1 or V2?"
}

これで文章がベクトル化されます。

・lookup
生成された埋め込みベクトルをFaiss Index Lookupを使用して、ユーザーの質問に関連する文書や情報を高速に検索します。

入力には、先ほどのベクトルとFaissのインデックスファイルのパスを入力します。

このときの出力は下記のようになります。

出力

[
  0:{
    "system_metrics":{
      "duration":0.152517
    }
    "output":[
      0:{
        "metadata":{
          "source":"https://learn.microsoft.com/en-us/azure/machine-learning/concept-v2"
          "title":"concept-v2"
        }
        "original_entity":NULL
        "score":0.3366261124610901
        "text":"inferencing (real time and batch) Stitch together multiple tasks and production workflows using Azure Machine Learning pipelines The SDK v2 is on par with CLI v2 functionality and is consistent in how assets (nouns) and actions (verbs) are used between SDK and CLI. For example, to list an asset, the list action can be used in both CLI and SDK. The same list action can be used to list a compute, model, environment, and so on. Use cases for SDK v2 The SDK v2 is useful in the following scenarios: Use Python functions to build a single step or a complex workflow SDK v2 allows you to build a single command or a chain of commands like Python functions - the command has a name, parameters, expects input, and returns output. Move from simple to complex concepts incrementally SDK v2 allows you to: Construct a single command. Add a hyperparameter sweep on top of that command, Add the command with various others into a pipeline one after the other. This construction is useful, given the iterative nature of machine"
        "vector":NULL
      }
      1:{
        "metadata":{
          "source":"https://learn.microsoft.com/en-us/azure/machine-learning/concept-v2"
          "title":"concept-v2"
        }
        "original_entity":NULL
        "score":0.3406101167201996
        "text":"learning. Reusable components in pipelines Azure Machine Learning introduces components for managing and reusing common logic across pipelines. This functionality is available only via CLI v2 and SDK v2. Managed inferencing Azure Machine Learning offers endpoints to streamline model deployments for both real-time and batch inference deployments. This functionality is available only via CLI v2 and SDK v2. Should I use v1 or v2? CLI v2 The Azure Machine Learning CLI v1 has been deprecated. We recommend you to use CLI v2 if: You were a CLI v1 user You want to use new features like - reusable components, managed inferencing You don't want to use a Python SDK - CLI v2 allows you to use YAML with scripts in python, R, Java, Julia or C# You were a user of R SDK previously - Azure Machine Learning won't support an SDK in R . However, the CLI v2 has support for R scripts. You want to use command line based automation/deployments You don't need Spark Jobs. This feature is currently available in preview in CLI v2. SDK"
        "vector":NULL
      }
      2:{
        "metadata":{
          "source":"https://learn.microsoft.com/en-us/azure/machine-learning/concept-v2"
          "title":"concept-v2"
        }
        "original_entity":NULL
        "score":0.3440413773059845
        "text":"v2 The Azure Machine Learning Python SDK v1 doesn't have a planned deprecation date. If you have significant investments in Python SDK v1 and don't need any new features offered by SDK v2, you can continue to use SDK v1. However, you should consider using SDK v2 if: You want to use new features like - reusable components, managed inferencing You're starting a new workflow or pipeline - all new features and future investments will be introduced in v2 You want to take advantage of the improved usability of the Python SDK v2 - ability to compose jobs and pipelines using Python functions, easy evolution from simple to complex tasks etc. Next steps How to upgrade from v1 to v2 Get started with CLI v2 Install and set up CLI (v2) Train models with the CLI (v2) Deploy and score models with online endpoints Get started with SDK v2 Install and set up SDK (v2) Train models with the Azure Machine Learning Python SDK v2 Tutorial: Create production ML pipelines with Python SDK v2 in a Jupyter notebook Feedback Submit and"
        "vector":NULL
      }
    ]
  }
]

ソースとそのテキストが返ってきました。

・format_context
このフローでは検索結果を一定の形式に変換します。検索結果から内容とソース情報を取得し、それを整形して文字列として返す役割を持っています。

format_context

from typing import List
from promptflow import tool
from promptflow_vectordb.core.contracts import SearchResultEntity


@tool
def format_context(search_result: List[dict]) -> str:
    def format_doc(doc: dict):
        return f"Content: {doc['Content']}\nSource: {doc['Source']}"

    SOURCE_KEY = "source"

    retrieved_docs = []
    for item in search_result:

        entity = SearchResultEntity.from_dict(item)
        content = entity.text or ""

        source = ""
        if entity.metadata is not None:
            if SOURCE_KEY in entity.metadata:
                source = entity.metadata[SOURCE_KEY] or ""

        retrieved_docs.append({
            "Content": content,
            "Source": source
        })
    doc_string = "\n\n".join([format_doc(doc) for doc in retrieved_docs])
    return doc_string

このときの出力は下記のようになります。

出力

[
  0:{
    "system_metrics":{
      "duration":0.000776
    }
    "output":"Content: inferencing (real time and batch) Stitch together multiple tasks and production workflows using Azure Machine Learning pipelines The SDK v2 is on par with CLI v2 functionality and is consistent in how assets (nouns) and actions (verbs) are used between SDK and CLI. For example, to list an asset, the list action can be used in both CLI and SDK. The same list action can be used to list a compute, model, environment, and so on. Use cases for SDK v2 The SDK v2 is useful in the following scenarios: Use Python functions to build a single step or a complex workflow SDK v2 allows you to build a single command or a chain of commands like Python functions - the command has a name, parameters, expects input, and returns output. Move from simple to complex concepts incrementally SDK v2 allows you to: Construct a single command. Add a hyperparameter sweep on top of that command, Add the command with various others into a pipeline one after the other. This construction is useful, given the iterative nature of machine Source: https://learn.microsoft.com/en-us/azure/machine-learning/concept-v2 Content: learning. Reusable components in pipelines Azure Machine Learning introduces components for managing and reusing common logic across pipelines. This functionality is available only via CLI v2 and SDK v2. Managed inferencing Azure Machine Learning offers endpoints to streamline model deployments for both real-time and batch inference deployments. This functionality is available only via CLI v2 and SDK v2. Should I use v1 or v2? CLI v2 The Azure Machine Learning CLI v1 has been deprecated. We recommend you to use CLI v2 if: You were a CLI v1 user You want to use new features like - reusable components, managed inferencing You don't want to use a Python SDK - CLI v2 allows you to use YAML with scripts in python, R, Java, Julia or C# You were a user of R SDK previously - Azure Machine Learning won't support an SDK in R . However, the CLI v2 has support for R scripts. You want to use command line based automation/deployments You don't need Spark Jobs. This feature is currently available in preview in CLI v2. SDK Source: https://learn.microsoft.com/en-us/azure/machine-learning/concept-v2 Content: v2 The Azure Machine Learning Python SDK v1 doesn't have a planned deprecation date. If you have significant investments in Python SDK v1 and don't need any new features offered by SDK v2, you can continue to use SDK v1. However, you should consider using SDK v2 if: You want to use new features like - reusable components, managed inferencing You're starting a new workflow or pipeline - all new features and future investments will be introduced in v2 You want to take advantage of the improved usability of the Python SDK v2 - ability to compose jobs and pipelines using Python functions, easy evolution from simple to complex tasks etc. Next steps How to upgrade from v1 to v2 Get started with CLI v2 Install and set up CLI (v2) Train models with the CLI (v2) Deploy and score models with online endpoints Get started with SDK v2 Install and set up SDK (v2) Train models with the Azure Machine Learning Python SDK v2 Tutorial: Create production ML pipelines with Python SDK v2 in a Jupyter notebook Feedback Submit and Source: https://learn.microsoft.com/en-us/azure/machine-learning/concept-v2"
  }
]

・answer_the_question_with_context
最後のフローでは、検索結果とユーザーの質問を元に解答を生成します。チャットで人間と対話をしながら、またソース情報を含めながら与えられたテキスト情報をもとにユーザーの質問に答えるようにプロンプトで頼んでいます。

プロンプト

system:
You are a chatbot having a conversation with a human.
Given the following extracted parts of a long document and a question, create a final answer with references ("SOURCES").
If you don't know the answer, just say that you don't know. Don't try to make up an answer.
ALWAYS return a "SOURCES" part in your answer and try to provide the links.

{{contexts}}

user:
{{question}}
Please indicate in start of the response whether you are using the additional knowledge provided in the prompt.

ここでもLLMの接続が必要なので、デプロイしているAzure OpenAIのデプロイやモデルを入力します。

実行結果

それでは実行をクリックして「What's AML SDK V2? Should I use V1 or V2?」というユーザーからの質問に対し、どんな出力がされるのかみてみましょう。

出力

[
  0:{
    "system_metrics":{
      "completion_tokens":349
      "duration":15.265254
      "prompt_tokens":843
      "total_tokens":1192
    }
    "output":"Based on the information provided in the prompt, Azure Machine Learning (AML) SDK v2 refers to the second version of the Azure Machine Learning Python SDK. It offers new features such as reusable components and managed inferencing, and provides improved usability for building workflows and pipelines using Python functions. Whether you should use AML SDK v1 or v2 depends on your specific requirements and circumstances. If you have significant investments in AML SDK v1 and do not need any new features offered by SDK v2, you can continue to use SDK v1. However, if you want to take advantage of the new features and improved usability of SDK v2, or if you are starting a new workflow or pipeline, it is recommended to use SDK v2. It's important to note that the Azure Machine Learning CLI v1 has been deprecated, and it is recommended to use CLI v2 if you were a CLI v1 user or if you want to use new features like reusable components and managed inferencing. CLI v2 also allows you to use YAML with scripts in various programming languages and supports command line-based automation and deployments. For more detailed information and guidance on upgrading from v1 to v2, as well as getting started with CLI v2 and SDK v2, you can refer to the following sources: - Upgrade from v1 to v2: [SOURCES](https://learn.microsoft.com/en-us/azure/machine-learning/concept-v2) - Get started with CLI v2: [SOURCES](https://learn.microsoft.com/en-us/azure/machine-learning/concept-v2) - Get started with SDK v2: [SOURCES](https://learn.microsoft.com/en-us/azure/machine-learning/concept-v2)"
  }
]

質問に対して最も関連性のあるドキュメントの内容を使って回答されました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up