More than 1 year has passed since last update.

【AzureML プロンプトフロー】Web Classification

Last updated at 2023-09-22Posted at 2023-09-22

はじめに

今回はAzureMLのプロンプトフローを試してみます。いくつかサンプルフローがありますが、今回はWeb Classificationを試します。

Web Classification

Web分類は、与えられたURLを特定のWebカテゴリに分類するためのフローです。
例えば、映画に関するページであれば「Movie」と分類したり、アプリケーションに関するものであれば「App」とカテゴリ分類をする、というものです。

Azure OpenAIの接続

プロンプトフローを実行するにあたり、LLM接続が必要となってくるため、すでにデプロイしたAzure OpenAIのモデルに接続します。

「接続」タブの「作成」から下記の必要な項目を入力してください。

Runtimeの作成

続いてRuntimeの作成を行う必要があります。

はじめに左タブの「プロンプトフロー」をクリックしたら「ランタイム」タブをクリックします。
そして「作成」をすると「マネージドオンラインエンドポイントデプロイ」と「コンピューティングインスタンス」のどちらかを選択できるようになります。

チーム共有が必要な場合はマネージドオンラインエンドポイントデプロイが良さそうですが、今回は特にこだわりはないため、推奨されている「コンピューティングインスタンス」でRuntimeを作成します。

「AzureMLコンピューティングインスタンスを作成する」をクリックし、

「コンピューティング名」や仮想マシンのサイズを選びます。

数分待てば、コンピューティングインスタンスが作成されるので、完了したら「ランタイム名」を入力し、Runtimeを作成します。

処理の流れ

このWeb Classificationでは、URLを入力するとそのWebページのcategoryとevidenceを出力します。

デフォルトの「入力」では、
「https://www.microsoft.com/en-us/store/collections/xboxseriessconsoles?icid=CNav_Xbox_Series_S」
というMicrosoftの商品紹介ページがすでに入力されているので、今回はこれを試して出力結果を見てみます。

Web Classificationの処理の流れはこのようになっています。

1つずつ流れや入力・出力を見ていきます。

・fetch_text_content_from_url
入力したURLからテキストを取得します。

BeautifulSoupを使用してHTMLを取得し、ページのテキストから最初の2000文字を抽出します。

fetch_text_content_from_url

from promptflow import tool
import requests
import bs4


@tool
def fetch_text_content_from_url(url: str):
    # Send a request to the URL
    try:
        headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
                                 "Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.35"}
        response = requests.get(url, headers=headers)
        if response.status_code == 200:
            # Parse the HTML content using BeautifulSoup
            soup = bs4.BeautifulSoup(response.text, 'html.parser')
            soup.prettify()
            return soup.get_text()[:2000]
        else:
            msg = f"Get url failed with status code {response.status_code}.\nURL: {url}\nResponse: " \
                  f"{response.text[:100]}"
            print(msg)
            return "No available content"
    except Exception as e:
        print("Get url failed with error: {}".format(e))
        return "No available content"

今回の場合は下記のような入力に対して、

入力

{
"url":"https://www.microsoft.com/en-us/store/collections/xboxseriessconsoles?icid=CNav_Xbox_Series_S"
}

出力は次のものが返ってきます。

出力

"output":" Xbox Series S - Microsoft Store Translate to English You are shopping Microsoft Store in: {0} Are you looking for Microsoft Store in: {0}? Stay in {0} Go to {0} Questions? Talk to an expert Can we help you? Chat now Book a shopping appointment No thanks Skip to main content Microsoft Gaming Gaming Gaming Home Xbox Consoles Consoles All Xbox consoles Xbox Series X Xbox Series S Xbox Certified Refurbished Accessories Accessories All Xbox accessories Xbox controllers Xbox headsets Xbox repair parts Game Pass Game Pass Compare all plans Xbox Game Pass Ultimate PC Game Pass Games Games Xbox games PC games Gift Card Gift Card Xbox Gift Card PC gaming PC games Accessories VR & Mixed Reality Xbox deals More All Microsoft Global Microsoft 365 Teams Windows Surface Xbox Deals Small Business Support Software Software Windows Apps AI Outlook OneDrive Microsoft Teams OneNote Microsoft Edge Skype PCs & Devices PCs & Devices Computers Shop Xbox Accessories VR & mixed reality Certified Refurbished Trade-in for "

ユーザーが送ったURLのテキストが返ってきました。

・summarize_text_content
ここでは先ほど出力されたテキストを、100語以内の要約にまとめる指示を行います。

LLMの接続が必要なので、デプロイしたAzure OpenAIのモデルを入力します。
今回は「gpt-3.5-turbo」を選択し、要約を行いました。

プロンプトは下記になります。
{{text}}の部分には、先ほど出力されたテキストが入力されます。

プロンプト

system:
Please summarize the following text in one paragraph. 100 words.
Do not add any information that is not in the text.

user:
Text: {{text}}
Summary:

要約結果はこちらです。

出力

[
  0:{
    "system_metrics":{
      "completion_tokens":62
      "duration":4.084723
      "prompt_tokens":375
      "total_tokens":437
    }
    "output":"The text appears to be a webpage from the Microsoft Store, specifically for the Xbox Series S. The page includes options to shop for Xbox consoles, accessories, Game Pass, games, and gift cards. It also provides links to other sections of the Microsoft Store, such as PC gaming, deals, and support."
  }
]

・prepare_examples
以下はChatGPTに回答例を渡すためのサンプルです。

urlとtext_contentとcategory、そしてevidenceで構成されています。

prepare_examples

from promptflow import tool

@tool
def prepare_examples():
    return [
        {
            "url": "https://play.google.com/store/apps/details?id=com.spotify.music",
            "text_content": "Spotify is a free music and podcast streaming app with millions of songs, albums, and "
                            "original podcasts. It also offers audiobooks, so users can enjoy thousands of stories. "
                            "It has a variety of features such as creating and sharing music playlists, discovering "
                            "new music, and listening to popular and exclusive podcasts. It also has a Premium "
                            "subscription option which allows users to download and listen offline, and access "
                            "ad-free music. It is available on all devices and has a variety of genres and artists "
                            "to choose from.",
            "category": "App",
            "evidence": "Both"
        },
        {
            "url": "https://www.youtube.com/channel/UC_x5XG1OV2P6uZZ5FSM9Ttw",
            "text_content": "NFL Sunday Ticket is a service offered by Google LLC that allows users to watch NFL "
                            "games on YouTube. It is available in 2023 and is subject to the terms and privacy policy "
                            "of Google LLC. It is also subject to YouTube's terms of use and any applicable laws.",
            "category": "Channel",
            "evidence": "URL"
        },
        {
            "url": "https://arxiv.org/abs/2303.04671",
            "text_content": "Visual ChatGPT is a system that enables users to interact with ChatGPT by sending and "
                            "receiving not only languages but also images, providing complex visual questions or "
                            "visual editing instructions, and providing feedback and asking for corrected results. "
                            "It incorporates different Visual Foundation Models and is publicly available. Experiments "
                            "show that Visual ChatGPT opens the door to investigating the visual roles of ChatGPT with "
                            "the help of Visual Foundation Models.",
            "category": "Academic",
            "evidence": "Text content"
        },
        {
            "url": "https://ab.politiaromana.ro/",
            "text_content": "There is no content available for this text.",
            "category": "None",
            "evidence": "Both"
        }
    ]

・classify_with_llm
最初に入力したURLと要約したテキストと回答例をプロンプトに入力して、categoryとevidenceを出力します。

プロンプトでは、categoryを「Movie」、「App」、「Academic」、「Channel」、「Profile」、「PDF」、「None」のいずれかを指定するようにし、evidenceをURL・Text Content・Bothのいずれかで提示するようにしています。

プロンプト

system:
Your task is to classify a given url into one of the following categories:
Movie, App, Academic, Channel, Profile, PDF or None based on the text content information.
The classification will be based on the url, the webpage text content summary, or both.

user:
The selection range of the value of "category" must be within "Movie", "App", "Academic", "Channel", "Profile", "PDF" and "None".
The selection range of the value of "evidence" must be within "Url", "Text content", and "Both".
Here are a few examples:
{% for ex in examples %}
URL: {{ex.url}}
Text content: {{ex.text_content}}
OUTPUT:
{"category": "{{ex.category}}", "evidence": "{{ex.evidence}}"}

{% endfor %}

For a given URL and text content, classify the url to complete the category and indicate evidence:
URL: {{url}}
Text content: {{text_content}}.
OUTPUT:

ここでもLLMの接続が必要なので、忘れずに選択します。

ちなみにこの時の出力は以下の通りです。

出力

"output":"{"category": "App", "evidence": "URL"}"

・convert_to_dict
先ほどの出力結果をJSON形式に変換し、出力します。

from promptflow import tool
import json


@tool
def convert_to_dict(input_str: str):
    try:
        return json.loads(input_str)
    except Exception as e:
        print("input is not valid, error: {}".format(e))
        return {
            "category": "None",
            "evidence": "None"
        }

実行結果

それでは「実行」をクリックして、指定したURLのカテゴリがどうなるかみてみましょう！

出力

[
  0:{
    "system_metrics":{
      "duration":0.000597
    }
    "output":{
      "category":"App"
      "evidence":"URL"
    }
  }
]

categoryは「App」、そしてevidenceは「URL」と返ってきました！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

【AzureML プロンプト フロー】Web Classification