More than 1 year has passed since last update.

Podcast Copilotから学ぶLLM中心アプリ開発

Posted at 2023-06-19

はじめに

Microsoft Build 2023でCTOのKevin Scott氏が紹介していたPodcast Copilotを試しながらLLM中心アプリ開発に必要な技術をキャッチアップしていきます。

Podcast Copilotとは

Podcastの音声ファイルが与えられたときに、ポッドキャストの新しいエピソードを宣伝するソーシャルメディアの投稿を簡単に作成するものです。
様々な機械学習モデル、APIをLangChainでオーケストレーションすることで実現しています。コードは以下から確認できます。

Podcast Copilotで使われるモデル・APIと役割は以下です。

モデル・API	役割
Whisper mode	音声ファイルからトランスクリプト生成
Dolly 2.0 model	トランスクリプトからゲスト名抽出
Bing Search Grounding API	ゲストの経歴の取得
GPT-4 model	SNS投稿の宣伝文作成
DALL-E model	SNS投稿の画像生成
LinkedIn Plugin	LinkedInへの投稿

今回はAzure OpenAI ServiceとGPT-4、DALL-Eのモデルがそれぞれ申請が必要なため、OpenAIのGPT-3.5、DALL-Eのモデルで代替します。
LinkedIn Pluginはまだ提供されていないため、SNSへの投稿前までを確認していきます。

Whisperモデルを使って音声ファイルからトランスクリプト生成する

サンプルでは音声ファイルを-40 dBFSより小さな沈黙が1秒続いたらチャンク分割しています。
split_on_silence(sound_file, min_silence_len=1000, silence_thresh=-40 )
WhisperのBaseモデルを呼び出し、文字の書き起こしをします。以下のページに説明がありますがモデルは5サイズ用意されているようです。言語により精度の違いは出そうですが、日本語でもそこそこの精度が期待できそうに見えます。サンプルの英語はかなりの精度で書き起こしできていました。

取得したトランスクリプト

Dolly 2.0 modelでゲスト名を抽出する

Databricks社が独自のデータセットを用意して生成することで商用利用を可能にしたオープンソース、指示追従LLMのDolly 2.0を使って先ほど生成したトランスクリプトからゲスト名を抽出します。
ここはGPT-4でも抽出可能ではないかと思うのですが、なぜDolly 2.0を採用しているのかはわかりませんでした。

今回はLangChainの1機能であるLLMChainを使ってプロンプトをフォーマットします。

PromptTemplateでトランスクリプトからゲスト名を抽出するプロンプトを与えてフォーマットし、LLMに渡すChainを作成します。

dolly2_prompt = PromptTemplate(
    input_variables=["transcript"],
    template="Extract the guest name on the Beyond the Tech podcast from the following transcript.  Beyond the Tech is hosted by Kevin Scott and Christina Warren, so they will never be the guests.  \n\n Transcript: {transcript}\n\n Host name: Kevin Scott\n\n Guest name: "
)

extract_llm_chain = LLMChain(llm=hf_pipeline, prompt=dolly2_prompt, output_key="guest")

トランスクリプトから抽出したゲスト

Bing Searchでゲストの経歴をGrounding

AzureでBing Searchリソースを作成します。

Bing Web Search API 7.0を使って紹介文を取得しています。

一番目のsnippetをそのまま抜き出しています。

bio = search_results["webPages"]["value"][0]["snippet"]

Bing検索の一番目

取得したBio

GPT-3.5モデルでSNS投稿宣伝文を作成

まずプロンプトを作成します。

system_template="You are a helpful large language model that can create a LinkedIn promo blurb for episodes of the podcast Behind the Tech, when given transcripts of the podcasts.  The Behind the Tech podcast is hosted by Kevin Scott.\n"
system_message_prompt = SystemMessagePromptTemplate.from_template(system_template)

user_prompt=PromptTemplate(
    template="Create a short summary of this podcast episode that would be appropriate to post on LinkedIn to promote the podcast episode.  The post should be from the first-person perspective of Kevin Scott, who hosts the podcast.\n" +
            "Here is the transcript of the podcast episode: {transcript} \n" +
            "Here is the bio of the guest: {bio} \n",
    input_variables=["transcript", "bio"],
)
human_message_prompt = HumanMessagePromptTemplate(prompt=user_prompt)
chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])

# Get formatted messages for the chat completion
blurb_messages = chat_prompt.format_prompt(transcript={transcript}, bio={bio}).to_messages()

system_message_promptでシステムとしての振る舞い方を与え、human_message_promptで具体的なタスクやトランスクリプト、Bioを与えます。

Azure OpenAI GPT-4が使える場合はそのままコードが使えるはずです。今回はOpenAI GPT-3.5を使うようにしています。

gpt3 = ChatOpenAI(
    openai_api_key=gpt_api_key,
)

OpenAI GPT-3.5で生成した宣伝文

DALL-EモデルでSNS投稿の画像生成

GPT-3.5で生成した宣伝文から画像生成するためのプロンプトを作成します。やることは宣伝文を作成したときと大体同じです。

system_template="You are a helpful large language model that generates DALL-E prompts, that when given to the DALL-E model can generate beautiful high-quality images to use in social media posts about a podcast on technology.  Good DALL-E prompts will contain mention of related objects, and will not contain people or words.  Good DALL-E prompts should include a reference to podcasting along with items from the domain of the podcast guest.\n"
system_message_prompt = SystemMessagePromptTemplate.from_template(system_template)

user_prompt=PromptTemplate(
    template="Create a DALL-E prompt to create an image to post along with this social media text: {social_media_copy}",
    input_variables=["social_media_copy"],
)
human_message_prompt = HumanMessagePromptTemplate(prompt=user_prompt)
chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])

# Get formatted messages for the chat completion
dalle_messages = chat_prompt.format_prompt(social_media_copy={social_media_copy}).to_messages()

print("Calling GPT-3.5 model on Azure OpenAI Service to get a DALL-E prompt...\n")
gpt3 = ChatOpenAI(
    openai_api_key=gpt_api_key,
)
#print(gpt3)   #shows parameters

output = gpt3(dalle_messages)
dalle_prompt = output.content

dalle_prompt_chain = LLMChain(llm=gpt3, prompt=chat_prompt, output_key="dalle_prompt")

生成した画像生成用のプロンプト

今回はAzureOpenAIではなくOpenAIのDALL-Eモデルを使ってプロンプトから画像を生成します。

openai.api_key = dalle_api_key
response = openai.Image.create(
    prompt=dalle_prompt,
    n=1,
    size="256x256",
)
imageURL = response["data"][0]["url"]

生成された画像

まとめ

Podcast Copilotを実際に動かすことでLLM中心アプリ開発を体験しました。LangChainなどのライブラリを活用してオーケストレーターを構築し、様々なLLMモデルを用途に合わせてうまく活用しながらタスクをこなせるようにしていくことがLLM中心アプリ開発だと実感しています。
このサンプルを動かせるようになると、LangChain、各種LLMモデルとBing Search APIでのGroundingのなんとなくの扱い方を理解できるので、LLM中心アプリ開発の基礎を学ぶ際におすすめです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up