- 本稿は、3 本の記事によるシリーズ投稿の 3 つ目です。
- 前回までの記事で準備したものを統合し、いよいよシチュエーション英会話練習アプリに仕上げます。
- 前回までの記事はこちら
crewAI に与えるタスク群の設計
英会話のシチュエーション設定を生み出すために、各 AI エージェントに与えるタスク群を下記のように抽出しました。
- 与えたキーワードに関連するアバター画像を作成する
- 作成したアバター画像からアバターの人物設定を生み出す
- アバターの人物設定から、ユーザーとの会話を始めるための、最初の質問文を作成する
上記を crewAI で実行し、生成したシチュエーション設定をもとに、Llama 3 による AI 英会話アバターと会話するアプリとしたいと思います。
DALL-E に英会話アバター画像を生成させる
最初のタスクとして、DALL-E に英会話シチュエーション用の AI アバター画像を作成させることにしましょう。
例えば、「New York」と指定したら「若いアジア系の女性がオフィスに立っている」などというように、英会話の相手となる AI アバターの画像を生成するためのプロンプトも考えてもらい、それをもとに DALL-E に画像を生成してもらうこととしましょう。
# This program requires the packages bellow:
# pip install 'crewai[tools]'
from crewai import Agent, Task, Crew, Process
from langchain_openai import ChatOpenAI
from langchain.tools import BaseTool, StructuredTool, tool
from langchain.pydantic_v1 import BaseModel, Field
import openai
client = openai.OpenAI()
class ImageInput(BaseModel):
prompt: str = Field(description='''
The prompt for the image generation like
"An young asian female person standing at an office".''')
def generate_image(prompt:str) -> str:
'''Generate an image for the prompt, and return the filename of the image.'''
print("Generating image ... for the prompt: ", prompt)
response = client.images.generate(
image_url = response.data[0].url
# download the image
import requests
import shutil
from datetime import datetime
response = requests.get(image_url, stream=True)
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
filename = f'output-{timestamp}.png'
with open(filename, 'wb') as out_file:
shutil.copyfileobj(response.raw, out_file)
print(f"Image saved to {filename}")
return filename
image_generation = StructuredTool.from_function(
description='''generate an image for the prompt
and return the filename of the image.''',
gpt4 = ChatOpenAI(model="gpt-4")
# Ask the place to go.
location = input('''Where do you want to go today?
(like Hawaii, Tokyo, or Silicon Valley, etc.) ''')
avator_maker_agent = Agent(
role = 'Avator Maker',
goal = '''Create a prompt to generate an image of a person who
talks with the user in English.
The prompt must include the age of the person like "young" or "old",
the region of the person like "asian", "african", "european", and "indian", etc.,
the gender of the person like "female" or "male", etc.,
the behaviour of the person like "standing", "smiling", "angry", "sad", and "surprised", etc.,
and the scene of the image like "an office", "a beach", "a city", "a forest" and "a mountain", etc.
For example, "An young asian female person standing at an office.".
And the prompt should be appropriate for the location provided by the user.
The agent must generate the image for the prompt.''',
backstory = 'You are an avator maker who creates an image of a person who talks with the user in English.',
allow_delegation = False,
verbose = True,
llm = gpt4,
tools = [image_generation],
image_generation_task = Task (
description = f'Create an image of a person at {location}.',
expected_output = 'A filename of the image.',
agent = avator_maker_agent,
human_input = False,
crew = Crew(
agents = [avator_maker_agent],
tasks = [image_generation_task],
process = 'sequential',
verbose = 2
result = crew.kickoff()
実行例 1:
- 指定した場所:
Silicon Valley
- GPT-4 がチョイスしてくれたプロンプト:
"A young asian male person standing and smiling in Silicon Valley."
- DALL-E が生成した画像:
実行例 2:
- 指定した場所:
New York
- GPT-4 がチョイスしてくれたプロンプト:
"A young European male standing and smiling in a city like NY."
- DALL-E が生成した画像:
Gemini Pro Vision に英会話アバターの人物エピソードを作らせる
そこで本稿では、画像や動画にも対応した、マルチモーダルモデルである Google Gemini Pro Vision を使うことにします。
Google AI Studio で API key を取得する
Gemini Pro Vision を使うには API キーを取得することが必要です。そこで、Google AI Studio で API キーを作成します。
crewAI のタスクとエージェントを定義
# This program requires the packages bellow:
# pip install 'crewai[tools]'
# pip install -q -U google-generativeai
# pip install Pillow
# https://ai.google.dev/gemini-api/docs/get-started/python?hl=ja
from crewai import Agent, Task, Crew, Process
from langchain_openai import ChatOpenAI
from langchain.tools import BaseTool, StructuredTool, tool
from langchain.pydantic_v1 import BaseModel, Field
import google.generativeai as genai
from PIL import Image
GOOGLE_API_KEY = 'your key'
model = genai.GenerativeModel('gemini-pro-vision')
def analyze_image(filepath:str, prompt:str) -> str:
'''Analyze an image of the filepath, and return the analyzation
of the image.'''
print("Analyzing the image ... : ", filepath)
img = Image.open(filepath)
response = model.generate_content([prompt, img])
analyzed_text = response.text
return analyzed_text
class ProfileCreationInput(BaseModel):
filepath: str = Field(description='''The file path of the image
to be analyzed''')
def create_profile(filepath:str) -> str:
'''Analyze an image of the filepath, and return the profile of
the person in the image.'''
prompt = '''Create a background story for the person in the picture
to introduce the persion itself to the user.
It should include the name of the person, the job, the gender,
the age, the nation, etc. based on the situation of the picture.
return analyze_image(filepath, prompt)
profile_creation = StructuredTool.from_function(
description='''Analyzing an image of the filepath, and return
the profile of the person in the image.''',
gpt4 = ChatOpenAI(model="gpt-4")
filepath = 'avator.png'
scenario_writer_agent = Agent(
role = 'Scenario Writer',
goal = '''Create an attractive story of the person in the picture
who talks with the user in English,
based on the specification of the task.''',
backstory = '''You are a creative scenario writer who creates
an attractive story of a person from the picture to start
the conversation with the user.''',
allow_delegation = False,
verbose = True,
llm = gpt4,
profile_creation_task = Task (
description = f'''
Create a profile of the person in the picture to introduce
the persion itself to the user.
the picture is saved on the filepath: '{filepath}'.
expected_output = 'the profile of the person',
agent = scenario_writer_agent,
tools = [profile_creation],
human_input = False,
question_creation_task = Task (
description = f'''
Create a question from the person in the picture to start
the conversation with the user,
based on the situation of the picture and the profile of
the person generated by the previous task.
expected_output = '''the self introduction of the person in
the picture, and the first question''',
agent = scenario_writer_agent,
human_input = False,
crew = Crew(
agents = [scenario_writer_agent],
tasks = [profile_creation_task, question_creation_task],
process = 'sequential',
verbose = 2
result = crew.kickoff()
print('The Avator Profile:')
print('The First Question:')
まず初めに、下記が Gemini Pro Vision がアバター画像を分析した上で、考えくれたプロフィールです。
The Avator Profile:
My name is Jessica Tan. I am 30 years old and I am the CEO
of a successful tech company. I am originally from Singapore,
but I have lived in the United States for most of my life.
I am a hard worker and I am always looking for new challenges.
I am passionate about technology and I believe that
it has the power to change the world. I am also a strong
advocate for diversity and inclusion in the tech industry.
I believe that everyone should have the opportunity
to succeed, regardless of their background.
I am a role model for many young women who are interested in
pursuing careers in technology. I am proof that it is possible
to achieve your dreams if you work hard and never give up.
I am also a strong advocate for diversity and inclusion
in the tech industry. I believe that everyone should have
the opportunity to succeed, regardless of their background.
そして次が、上記のプロフィールをもとに、GPT-4 が考えてくれた、ユーザとの英会話の切り口となる、最初の質問文です。
The First Question:
Hi there, I'm Jessica Tan, a 30-year-old tech CEO
who's originally from Singapore but spent most of
my life in the United States. I have a passion for
technology and I believe in its power to bring about
significant change in the world. As an advocate for
diversity and inclusion in the tech industry,
I strive to create opportunities for everyone to succeed,
regardless of their background. I'm proud to be a role model
for young women interested in technology.
I stand as proof that with hard work and perseverance,
achieving your dreams is possible. Now, I'd love to know
your thoughts. How do you think we can further promote
diversity and inclusion in the tech industry?
最後に複数のエージェントとタスクを統合し、チャット風 UI を付ける
1本目は、シチュエーション設定の生成プログラムです。シチュエーション設定の生成プログラムでは、本稿で紹介した、GPT-4 と DALL-E、Gemini Pro を組み合わせて作らせた、英会話アバターの画像や人物設定、最初の質問文、などを、シチュエーション設定ファイルに保存するものとします。
# This program requires the following packages:
# $ pip install 'crewai[tools]'
# $ pip install google-generativeai
# $ pip install Pillow
# $ pip install pyyaml
from crewai import Agent, Task, Crew
from langchain_openai import ChatOpenAI
from langchain.tools import StructuredTool
from langchain.pydantic_v1 import BaseModel, Field
import openai
import google.generativeai as genai
from PIL import Image
import yaml
client = openai.OpenAI()
gpt4 = ChatOpenAI(model="gpt-4")
GOOGLE_API_KEY = 'your key'
model = genai.GenerativeModel('gemini-pro-vision')
# Ask the place to go.
location = input('''Where do you want to go today?
(like Hawaii, Tokyo, or Silicon Valley, etc.) ''')
class ImageInput(BaseModel):
prompt: str = Field(description='''The prompt for the image
generation like "An young asian female person standing at an office".''')
def generate_image(prompt:str) -> str:
'''Generate an image for the prompt, and return the filename of the image.'''
print("Generating image ... for the prompt: ", prompt)
response = client.images.generate(
image_url = response.data[0].url
# download the image
import requests
import shutil
from datetime import datetime
response = requests.get(image_url, stream=True)
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
filename = f'output-{timestamp}.png'
with open(filename, 'wb') as out_file:
shutil.copyfileobj(response.raw, out_file)
print(f"Image saved to {filename}")
return filename
image_generation = StructuredTool.from_function(
description='''generate an image for the prompt and return the
filename of the image.''',
def analyze_image(filepath:str, prompt:str) -> str:
'''Analyze an image of the filepath, and return the analyzation of the image.'''
print("Analyzing the image ... : ", filepath)
img = Image.open(filepath)
response = model.generate_content([prompt, img])
analyzed_text = response.text
return analyzed_text
class ProfileCreationInput(BaseModel):
filepath: str = Field(description='''The file path of the image
to be analyzed''')
def create_profile(filepath:str) -> str:
'''Analyze an image of the filepath, and return the profile of
the person in the image.'''
prompt = '''Create a background story for the person in the picture
to introduce the persion itself to the user.
It should include the name of the person, the job, the gender,
the age, the nation, etc. based on the situation of the picture.
return analyze_image(filepath, prompt)
profile_creation = StructuredTool.from_function(
description='''Analyzing an image of the filepath,
and return the profile of the person in the image.''',
avator_maker_agent = Agent(
role = 'Avator Maker',
goal = '''Create a prompt to generate an image of a person who
talks with the user in English.
The prompt must include the age of the person like "young" or "old",
the region of the person like "asian", "african", "european", and "indian", etc.,
the gender of the person like "female" or "male", etc.,
the behaviour of the person like "standing", "smiling", "angry", "sad", and "surprised", etc.,
and the scene of the image like "an office", "a beach", "a city", "a forest" and "a mountain", etc.
For example, "An young asian female person standing at an office.".
And the prompt should be appropriate for the location provided by the user.
The agent must generate the image for the prompt.''',
backstory = 'You are an avator maker who creates an image of a
person who talks with the user in English.',
allow_delegation = False,
verbose = True,
llm = gpt4,
tools = [image_generation],
image_generation_task = Task (
description = f'Create an image of a person at {location}.',
expected_output = 'A filename of the image.',
agent = avator_maker_agent,
human_input = False,
crew = Crew(
agents = [avator_maker_agent],
tasks = [image_generation_task],
process = 'sequential',
verbose = 2
filepath = crew.kickoff()
scenario_writer_agent = Agent(
role = 'Scenario Writer',
goal = '''Create an attractive story of the person in the picture
who talks with the user in English,
based on the specification of the task.''',
backstory = '''You are a creative scenario writer who creates
an attractive story of a person from the picture to start
the conversation with the user.''',
allow_delegation = False,
verbose = True,
llm = gpt4,
profile_creation_task = Task (
description = f'''
Create a profile of the person in the picture to introduce
the persion itself to the user.
the picture is saved on the filepath: '{filepath}'.
expected_output = 'the profile of the person',
agent = scenario_writer_agent,
tools = [profile_creation],
human_input = False,
question_creation_task = Task (
description = f'''
Create a question from the person in the picture to start
the conversation with the user,
based on the situation of the picture and the profile of
the person generated by the previous task.
expected_output = '''the self introduction of the person in
the picture, and the first question''',
agent = scenario_writer_agent,
human_input = False,
crew = Crew(
agents = [scenario_writer_agent],
tasks = [profile_creation_task, question_creation_task],
process = 'sequential',
verbose = 2
result = crew.kickoff()
avator_profile = profile_creation_task.output.raw_output
first_question = question_creation_task.output.raw_output
print('The Avator Profile:')
print('The First Question:')
# Save the situation (location, filepath, avator_profile, first_question) to the file
situation = {
"location": location,
"filepath": filepath,
"avator_profile": avator_profile,
"first_question": first_question
with open('situation.yaml', 'w') as file:
yaml.safe_dump(situation, file)
2本目は、シチュエーション設定ファイルを読み込み、シチュエーション設定に沿って、AI 英会話アバターのチャット UI を起動するプログラムです。
こちらは、1 本目の記事で紹介した、Llama 3 を Ollama でローカル実行し、Streamlit でチャット風 UI を付ける、の内容とほぼ同じです。
# This program requires the following packages:
# $ pip install ollama
# $ pip install streamlit
# $ pip install pyyaml
# To run this program, type the command bellow:
# $ streamlit run situation-chat.py
import ollama
import streamlit as st
import yaml
with open("situation.yaml", "r") as file:
situation = yaml.safe_load(file)
location = situation["location"]
filepath = situation["filepath"]
avator_profile = situation["avator_profile"]
first_question = situation["first_question"]
st.title("Your Personal English Coach")
# Add a header image
header_image = filepath
st.image(header_image, caption='Situation', use_column_width=True)
if "messages" not in st.session_state:
st.session_state["messages"] = [{"role": "system", "content": avator_profile}]
st.session_state["messages"] = [{"role": "assistant", "content": first_question}]
### Write Message History
for msg in st.session_state.messages:
if msg["role"] == "user":
st.chat_message(msg["role"], avatar="🧑💻").write(msg["content"])
st.chat_message(msg["role"], avatar="😃").write(msg["content"])
## Generator for Streaming Tokens
def generate_response():
response = ollama.chat(model='llama3', stream=True, messages=st.session_state.messages)
for partial_resp in response:
token = partial_resp["message"]["content"]
st.session_state["full_message"] += token
yield token
if prompt := st.chat_input():
st.session_state.messages.append({"role": "user", "content": prompt})
st.chat_message("user", avatar="🧑💻").write(prompt)
st.session_state["full_message"] = ""
st.chat_message("assistant", avatar="😃").write_stream(generate_response)
st.session_state.messages.append({"role": "assistant", "content": st.session_state["full_message"]})
下記が AI エージェントたちが生成してくれた人物設定です。
Hi, I am Sarah. I am 25 years old. I am a model.
I am from California, USA. I love to travel and
I have been to many countries. I am always up for
an adventure and I love to meet new people.
I am a very outgoing and friendly person and I love to have fun.
I am always looking for new things to do and
I am always up for a challenge.
I am a very determined person and I never give up on my dreams.
I am always looking for ways to improve myself and
I am always striving to be the best that I can be.
そして、下記が AI エージェントたちが生成してくれた最初の質問文です。
"Hi there, I'm Sarah, a model from sunny California.
I love traveling and meeting new people -
it's one of the ways I challenge myself to keep growing.
I'm curious, what's the most adventurous thing
you've ever done or the most amazing place you've ever been?
Maybe I can add it to my bucket list!"
最初のシチュエーション生成の部分は、GPT-4 や DALL-E、Gemini Pro Vision などを使いましたが、あとは Llama 3 による AI アバターと、無料のシチュエーション英会話を存分に楽しむことができます。
みなさんは、どんな AI アプリケーションを作りたい、と思われたでしょうか?