More than 1 year has passed since last update.

リンクアンドモチベーションAdvent Calendar 2023

Langchainを、使う (LLMでAgentを作れるライブラリ4種類を比べてみた)

Last updated at 2024-02-05Posted at 2023-12-16

本記事はリンクアンドモチベーション Advent Calendar 2023の16日目の記事になります。

はじめに

今年はChatGPTのAPIの公開を始めとしてまさに怒涛の1年でした。
その中で今後業務や作業を圧倒的に効率化してくれるであろう、自律的にタスクを解決するために設計された「Agent」を作成することが出来る、数多くのライブラリやフレームワークが発展してきました。
どのライブラリも類似するAgentを作成することは可能ですが、それぞれの抽象化、方針により実装方法に少しずつ違いがあり、そのまま使う場合は得意なもの・不得意なものがそれぞれ異なる状態です。
本記事では、各ライブラリで、よく使われるであろう似たような形のAgentのデモコードを列挙し、その実装方法の違いを見ていきます。
これにより、場面によって適切にライブラリを選択する上で材料になれば幸いです。

対象

比較対象は下記4つとします。

Assistant API(OpenAI)
llama index(LlamaIndex)
langchain(Langchain)
Autogen(Microsoft)

対象のタスクは以下2つ

RAG(Retrieval Augmented Generation)を使ったAgent
複数Agent(もしくはtool)を使ったAgent

Assistant API

OpenAIは2023/12/15時点では現状、beta版ではありますがAgent的な動作をさせることが出来るAssistant APIを提供しています。

RAG

この後出てくるAutogenに関する論文の内容に基づき回答してくれるAgentの実装は下記の様になります。

実装詳細

import base64
import requests

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

from openai import OpenAI
client = OpenAI()

# Fileのアップロード
uploaded_file = client.files.create(
    file=open("/content/autogen.pdf", "rb"),
    purpose="assistants",
)
file_id = uploaded_file.id
print(file_id)

# Assistantの作成
my_assistant = client.beta.assistants.create(
    name="pdf bot",
    description="pdf等のドキュメントに基づいて回答してくれるアシスタント",
    model="gpt-4-1106-preview",
    instructions="あなたは、ドキュメントの内容を元に回答を行うアシスタントです。",
    tools=[{"type": "retrieval"}, {"type": "code_interpreter"}],
    file_ids=[file_id],
)
assistant_id = my_assistant.id
print(assistant_id)


# Threadの作成
empty_thread = client.beta.threads.create()
thread_id = empty_thread.id
print(thread_id)

# Messageの作成
client.beta.threads.messages.create(
    thread_id=thread_id,
    role="user",
    content="概要を日本語で教えて",
)

# Assistantの実行
run = client.beta.threads.runs.create(
    thread_id=thread_id,
    assistant_id=assistant_id,
)
run_id = run.id
print(run_id)

# Runのステータスの確認
run_retrieve = client.beta.threads.runs.retrieve(
    thread_id=thread_id,
    run_id=run_id,
)
print(run_retrieve.status)

messages = client.beta.threads.messages.list(
    thread_id=thread_id
)
print(messages.data)

for message in messages.data:
    display(message)

実行結果

ドキュメントは、「AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation」と題された論文であり、2023年10月3日にarXivで公開されたものであると示されています。著者には、Microsoft Research、ペンシルバニア州立大学、ワシントン大学、西安電子科技大学からの研究者が含まれています。

この論文の要約を以下に示します。
**概要と紹介**:
AutoGenはオープンソースのフレームワークで、開発者が任務を達成するために相互に対話できる複数のエージェントを用いて、LLMアプリケーションを構築できるようにするものです。AutoGenエージェントはカスタマイズ可能であり、LLM、人間の入力、ツールの組み合わせなどで動作することができ、さまざまなモードで活用できます。このフレームワークを使用して、天然言語やコンピューターコードを用いて、柔軟な対話パターンをプログラムすることができます。様々な複雑さやLLMのキャパシティを持つアプリケーションを構築するための汎用的な枠組みとして機能します。実証研究により、数学、コーディング、クエスチョンアンサリング、オペレーションリサーチ、オンライン意思決定、エンターテインメントなど、多岐にわたるドメインのアプリケーションでのフレームワークの有効性が示されています。
**AutoGenフレームワークの概念**:
AutoGenは、エージェントが自分の能力を最大限に生かして彼らのメッセージを処理し、応答するような方法でエージェントの柔軟性を提供しています。エージェントは自社の内部コンテキストを維持し、様々な能力を設定することができます。これらには、LLMによる役割演技、暗示的状態推論、会話履歴に基づく進捗、フィードバックの提供、フィードバックからの適応、コーディングなどが含まれます。次に、エージェントが対話を通じて互いに行動する方法（会話プログラミング）について述べられています。\n\nドキュメント内には他のセクションも存在し、関連する研究、将来の研究領域およびエージェントチャットの例などについても議論されていると推測されますが、より詳しい内容を確認するためにはさらに文書を調査する必要があります。

特徴

threadを前提にしていて、memory(会話の履歴)を気にせずずっと会話を続けられる
その引き換えに、maxのtokenまで行ってしまうとずっとmax token分OpenAIのAPIを叩いてしまうため、APIの利用料金が大変なことになる

LlamaIndex

RAGを用いたAgent

複数の都市に関するウィキペディアの情報を元に回答してくれるAgentの実装は下記のようになります。
デモから持ってきたが、ちょっと長い

実装詳細

from llama_index import (
    VectorStoreIndex,
    SummaryIndex,
    SimpleKeywordTableIndex,
    SimpleDirectoryReader,
    ServiceContext,
)
from llama_index.schema import IndexNode
from llama_index.tools import QueryEngineTool, ToolMetadata
from llama_index.llms import OpenAI

wiki_titles = [
    "Toronto",
    "Seattle",
    "Chicago",
    "Boston",
    "Houston",
    "Tokyo",
    "Berlin",
    "Lisbon",
    "Paris",
    "London",
    "Atlanta",
    "Munich",
    "Shanghai",
    "Beijing",
    "Copenhagen",
    "Moscow",
    "Cairo",
    "Karachi",
]
from pathlib import Path

import requests

for title in wiki_titles:
    response = requests.get(
        "https://en.wikipedia.org/w/api.php",
        params={
            "action": "query",
            "format": "json",
            "titles": title,
            "prop": "extracts",
            # 'exintro': True,
            "explaintext": True,
        },
    ).json()
    page = next(iter(response["query"]["pages"].values()))
    wiki_text = page["extract"]

    data_path = Path("data")
    if not data_path.exists():
        Path.mkdir(data_path)

    with open(data_path / f"{title}.txt", "w") as fp:
        fp.write(wiki_text)

# Load all wiki documents
city_docs = {}
for wiki_title in wiki_titles:
    city_docs[wiki_title] = SimpleDirectoryReader(
        input_files=[f"data/{wiki_title}.txt"]
    ).load_data()

llm = OpenAI(temperature=0, model="gpt-4")
service_context = ServiceContext.from_defaults(llm=llm)

from llama_index.agent import OpenAIAgent
from llama_index import load_index_from_storage, StorageContext
from llama_index.node_parser import SentenceSplitter
import os

node_parser = SentenceSplitter()

# Build agents dictionary
agents = {}
query_engines = {}

# this is for the baseline
all_nodes = []

for idx, wiki_title in enumerate(wiki_titles):
    nodes = node_parser.get_nodes_from_documents(city_docs[wiki_title])
    all_nodes.extend(nodes)

    if not os.path.exists(f"./data/{wiki_title}"):
        # build vector index
        vector_index = VectorStoreIndex(nodes, service_context=service_context)
        vector_index.storage_context.persist(
            persist_dir=f"./data/{wiki_title}"
        )
    else:
        vector_index = load_index_from_storage(
            StorageContext.from_defaults(persist_dir=f"./data/{wiki_title}"),
            service_context=service_context,
        )

    # build summary index
    summary_index = SummaryIndex(nodes, service_context=service_context)
    # define query engines
    vector_query_engine = vector_index.as_query_engine()
    summary_query_engine = summary_index.as_query_engine()

    # define tools
    query_engine_tools = [
        QueryEngineTool(
            query_engine=vector_query_engine,
            metadata=ToolMetadata(
                name="vector_tool",
                description=(
                    "Useful for questions related to specific aspects of"
                    f" {wiki_title} (e.g. the history, arts and culture,"
                    " sports, demographics, or more)."
                ),
            ),
        ),
        QueryEngineTool(
            query_engine=summary_query_engine,
            metadata=ToolMetadata(
                name="summary_tool",
                description=(
                    "Useful for any requests that require a holistic summary"
                    f" of EVERYTHING about {wiki_title}. For questions about"
                    " more specific sections, please use the vector_tool."
                ),
            ),
        ),
    ]

    # build agent
    function_llm = OpenAI(model="gpt-4")
    agent = OpenAIAgent.from_tools(
        query_engine_tools,
        llm=function_llm,
        verbose=True,
        system_prompt=f"""\
You are a specialized agent designed to answer queries about {wiki_title}.
You must ALWAYS use at least one of the tools provided when answering a question; do NOT rely on prior knowledge.\
""",
    )

    agents[wiki_title] = agent
    query_engines[wiki_title] = vector_index.as_query_engine(
        similarity_top_k=2
    )

# define tool for each document agent
all_tools = []
for wiki_title in wiki_titles:
    wiki_summary = (
        f"This content contains Wikipedia articles about {wiki_title}. Use"
        f" this tool if you want to answer any questions about {wiki_title}.\n"
    )
    doc_tool = QueryEngineTool(
        query_engine=agents[wiki_title],
        metadata=ToolMetadata(
            name=f"tool_{wiki_title}",
            description=wiki_summary,
        ),
    )
    all_tools.append(doc_tool)

# define an "object" index and retriever over these tools
from llama_index import VectorStoreIndex
from llama_index.objects import ObjectIndex, SimpleToolNodeMapping

tool_mapping = SimpleToolNodeMapping.from_objects(all_tools)
obj_index = ObjectIndex.from_objects(
    all_tools,
    tool_mapping,
    VectorStoreIndex,
)

from llama_index.agent import FnRetrieverOpenAIAgent

top_agent = FnRetrieverOpenAIAgent.from_retriever(
    obj_index.as_retriever(similarity_top_k=3),
    system_prompt=""" \
You are an agent designed to answer queries about a set of given cities.
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\

""",
    verbose=True,
)

base_index = VectorStoreIndex(all_nodes)
base_query_engine = base_index.as_query_engine(similarity_top_k=4)

# should use Boston agent -> vector tool
response = top_agent.query("Tell me about the arts and culture in Tokyo in japanese")

複数toolを持つAgent

足し算と掛け算が出来るtoolを持つAgentの実装は下記の様になります
関数定義→それぞれの関数をToolとして定義しlist型に格納→OpenAIAgent.from_toolsでToolを場面によって使い分けるAgentを作成という流れでAgentを作成します。
作成されたAgentのインスタンスに対して、chat関数を実行することで、toolのmetadataをpromptに入れて実行し、ツールの選択、実行を行います。
※metadataの具体例はこの様な形

ToolMetadata(description='multiply(a: int, b: int) -> int\nMultiple two integers and returns the result integer', name='multiply', fn_schema=<class 'pydantic.main.multiply'>)

そのため、ちゃんとtoolを使い分けるためにはdocstringをちゃんと書く必要があります

実装詳細

from llama_index.agent import OpenAIAgent
from llama_index.llms import OpenAI, ChatMessage
from llama_index.tools import BaseTool, FunctionTool

def multiply(a: int, b: int) -> int:
    """Multiple two integers and returns the result integer"""
    return a * b


def add(a: int, b: int) -> int:
    """Add two integers and returns the result integer"""
    return a + b

# それぞれの関数をFunctionToolに
multiply_tool = FunctionTool.from_defaults(fn=multiply)
add_tool = FunctionTool.from_defaults(fn=add)
tools = [multiply_tool, add_tool]

# OpenAIAgentを定義
llm = OpenAI(model="gpt-3.5-turbo-0613")
agent = OpenAIAgent.from_tools(
    tools, llm=llm, verbose=True
)

# 結果を出力
response = agent.chat(
    "121 * 2はいくつでしょうか?"
)

print(response.response)

実行結果


STARTING TURN 1
---------------

=== Calling Function ===
Calling function: multiply with args: {
  "a": 121,
  "b": 2
}
Got output: 242
========================

STARTING TURN 2
---------------

121 * 2は242です。

特徴

短い実装で簡単に書ける
Assistant APIだけだと関数の実行までは行ってくれないが、関数の実行までよしなにやってくれる
LlamaHubで自作のデータローダー、Agent Toolsなどが多くあるので、拡張を色々試しやすい

Langchain

langchainは個人的にはプロダクトの機能開発で実際に使っていたこともあり、最も馴染みがあり使いやすい気がしています。
llamaindexだけで出来ることは大体全て出来るイメージです

RAG

google検索を含むRAGを使ったAgentの実装は下記の通り

実装詳細

from langchain.agents import AgentType, Tool, initialize_agent
from langchain.chat_models import ChatOpenAI
from langchain.utilities import SerpAPIWrapper

# Initialize the language model
# You can add your own OpenAI API key by adding openai_api_key="<your_api_key>"
llm = ChatOpenAI(temperature=0, model="gpt-4")

# Initialize the SerpAPIWrapper for search functionality
# Replace <your_api_key> in openai_api_key="<your_api_key>" with your actual SerpAPI key.
search = SerpAPIWrapper()

# Define a list of tools offered by the agent
tools = [
    Tool(
        name="Search",
        func=search.run,
        coroutine=search.arun,
        description="Useful when you need to answer questions about current events. You should ask targeted questions.",
    ),
]

functions_agent = initialize_agent(
    tools, llm, agent=AgentType.OPENAI_MULTI_FUNCTIONS, verbose=False
)

実行

functions_agent.run('ボッチザ・ロックって何?')

実行結果

「ボッチザ・ロック」は、はまじあきによる日本の4コマ漫画で、そのアニメ化作品のことを指します。物語は、極度の人見知りでギターを愛する高校一年生の後藤ひとり（通称「ぼっちちゃん」）が主人公。彼女は、伊地知虹夏が率いる「結束バンド」に加入し、バンド活動を通じて成長していきます。アニメの初回放送は2022年10月9日でした。

実行結果詳細


[chain/start] [1:chain:AgentExecutor] Entering Chain run with input:
{
  "input": "ボッチザ・ロックって何?"
}
[llm/start] [1:chain:AgentExecutor > 2:llm:ChatOpenAI] Entering LLM run with input:
{
  "prompts": [
    "System: You are a helpful AI assistant.\nHuman: ボッチザ・ロックって何?"
  ]
}
[llm/end] [1:chain:AgentExecutor > 2:llm:ChatOpenAI] [4.79s] Exiting LLM run with output:
{
  "generations": [
    [
      {
        "text": "",
        "generation_info": {
          "finish_reason": "function_call"
        },
        "type": "ChatGeneration",
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain",
            "schema",
            "messages",
            "AIMessage"
          ],
          "kwargs": {
            "content": "",
            "additional_kwargs": {
              "function_call": {
                "arguments": "{\n  \"actions\": [\n    {\n      \"action_name\": \"Search\",\n      \"action\": {\n        \"tool_input\": \"ボッチザ・ロック\"\n      }\n    }\n  ]\n}",
                "name": "tool_selection"
              }
            }
          }
        }
      }
    ]
  ],
  "llm_output": {
    "token_usage": {
      "completion_tokens": 50,
      "prompt_tokens": 84,
      "total_tokens": 134
    },
    "model_name": "gpt-4",
    "system_fingerprint": null
  },
  "run": null
}
[tool/start] [1:chain:AgentExecutor > 3:tool:Search] Entering Tool run with input:
"{'tool_input': 'ボッチザ・ロック'}"
[tool/end] [1:chain:AgentExecutor > 3:tool:Search] [1.71s] Exiting Tool run with output:
"['Hitori "Bocchi-chan" GotÅ_x008d_ is a lonely high school girl whose heart lies in her guitar; she meets Nijika Ijichi and joins "Kessoku Band."', 'Bocchi The Rock! (ぼっち・ざ・ろっく!) main_tab_text: Overview.', 'Bocchi The Rock! (ぼっち・ざ・ろっく!) kgmid: /g/11jsjgrhqh.', 'Bocchi The Rock! (ぼっち・ざ・ろっく!) first_episode_date: October 9, 2022 (Japan).', 'BOCCHI THE ROCK! · MOVIE · NEWS · アルバム『結束バンド』が 2023年 年間Billboard JAPAN “Download Albums” で首位獲得！ · アルバム『結束バンド』がGOLD ALBUMに認定！', '極度の人見知りで陰キャな高校一年生。結束バンドのリードギター担当。陰キャでも輝けそうなバンド活動に憧れギターを始める。腕前は本物だが、バンドや人前でうまく ...', '『ぼっち・ざ・ろっく！』(BOCCHI THE ROCK!）は、はまじあきによる日本の4コマ漫画。『まんがタイムきららMAX』（芳文社）にて、2018年2月号から4月号までゲスト連載 ...', 'TVアニメ「ぼっち・ざ・ろっく！」公式アカウントThank you! 原作:はまじあき（芳文社「まんがタイムきららMAX」連載中)／ 監督：斎藤圭一郎／シリーズ構成・脚本： ...', '「ぼっちちゃん」こと後藤ひとりは、ギターを愛する孤独な少女。 家で一人寂しく弾くだけの毎日でしたが、ひょんなことから伊地知虹夏が率いる「結束バンド」に加入 ...', 'Bocchi the Rock! Ending 4 Full - (転がる岩、君に朝が降る) - Korogaru Iwa, Kimi ni Asa ga Furu by Kessoku Band. Rumi•575K views · 3:51.', '人前での演奏に不慣れな後藤は、立派なバンドマンになれるのか――！？ 全国のぼっちな少年少女に届ける、いま最高にアツい音楽漫画！！ 陰キャならロックをやれ！！！', '... 3:25 · Go to channel. Bocchi the Rock!! Opening Full - Seishun Complex by Kessoku Band. Rumi•740K views · 1:31 · Go to channel. TVアニメ ...', '極度の人見知りで陰キャな少女、後藤ひとり。バンド活動に憧れギターを始めるも友達が出来ず一人で練習する毎日。ある日“結束バンド”というバンドでドラムをやっている ...', '「ぼっちちゃん」こと後藤ひとりは、ギターを愛する孤独な少女。 家で一人寂しく弾くだけの毎日だったが、ひょんなことから 伊地知虹夏が率いる「結束バンド」に加入 ...']"
[llm/start] [1:chain:AgentExecutor > 4:llm:ChatOpenAI] Entering LLM run with input:
{
  "prompts": [
    "System: You are a helpful AI assistant.\nHuman: ボッチザ・ロックって何?\nAI: {'arguments': '{\\n  \"actions\": [\\n    {\\n      \"action_name\": \"Search\",\\n      \"action\": {\\n        \"tool_input\": \"ボッチザ・ロック\"\\n      }\\n    }\\n  ]\\n}', 'name': 'tool_selection'}\nFunction: ['Hitori \"Bocchi-chan\" GotÅ_x008d_ is a lonely high school girl whose heart lies in her guitar; she meets Nijika Ijichi and joins \"Kessoku Band.\"', 'Bocchi The Rock! (ぼっち・ざ・ろっく!) main_tab_text: Overview.', 'Bocchi The Rock! (ぼっち・ざ・ろっく!) kgmid: /g/11jsjgrhqh.', 'Bocchi The Rock! (ぼっち・ざ・ろっく!) first_episode_date: October 9, 2022 (Japan).', 'BOCCHI THE ROCK! · MOVIE · NEWS · アルバム『結束バンド』が 2023年 年間Billboard JAPAN “Download Albums” で首位獲得！ · アルバム『結束バンド』がGOLD ALBUMに認定！', '極度の人見知りで陰キャな高校一年生。結束バンドのリードギター担当。陰キャでも輝けそうなバンド活動に憧れギターを始める。腕前は本物だが、バンドや人前でうまく ...', '『ぼっち・ざ・ろっく！』(BOCCHI THE ROCK!）は、はまじあきによる日本の4コマ漫画。『まんがタイムきららMAX』（芳文社）にて、2018年2月号から4月号までゲスト連載 ...', 'TVアニメ「ぼっち・ざ・ろっく！」公式アカウントThank you! 原作:はまじあき（芳文社「まんがタイムきららMAX」連載中)／ 監督：斎藤圭一郎／シリーズ構成・脚本： ...', '「ぼっちちゃん」こと後藤ひとりは、ギターを愛する孤独な少女。 家で一人寂しく弾くだけの毎日でしたが、ひょんなことから伊地知虹夏が率いる「結束バンド」に加入 ...', 'Bocchi the Rock! Ending 4 Full - (転がる岩、君に朝が降る) - Korogaru Iwa, Kimi ni Asa ga Furu by Kessoku Band. Rumi•575K views · 3:51.', '人前での演奏に不慣れな後藤は、立派なバンドマンになれるのか――！？ 全国のぼっちな少年少女に届ける、いま最高にアツい音楽漫画！！ 陰キャならロックをやれ！！！', '... 3:25 · Go to channel. Bocchi the Rock!! Opening Full - Seishun Complex by Kessoku Band. Rumi•740K views · 1:31 · Go to channel. TVアニメ ...', '極度の人見知りで陰キャな少女、後藤ひとり。バンド活動に憧れギターを始めるも友達が出来ず一人で練習する毎日。ある日“結束バンド”というバンドでドラムをやっている ...', '「ぼっちちゃん」こと後藤ひとりは、ギターを愛する孤独な少女。 家で一人寂しく弾くだけの毎日だったが、ひょんなことから 伊地知虹夏が率いる「結束バンド」に加入 ...']"
  ]
}
[llm/end] [1:chain:AgentExecutor > 4:llm:ChatOpenAI] [13.97s] Exiting LLM run with output:
{
  "generations": [
    [
      {
        "text": "「ボッチザ・ロック」は、はまじあきによる日本の4コマ漫画で、そのアニメ化作品のことを指します。物語は、極度の人見知りでギターを愛する高校一年生の後藤ひとり（通称「ぼっちちゃん」）が主人公。彼女は、伊地知虹夏が率いる「結束バンド」に加入し、バンド活動を通じて成長していきます。アニメの初回放送は2022年10月9日でした。",
        "generation_info": {
          "finish_reason": "stop"
        },
        "type": "ChatGeneration",
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain",
            "schema",
            "messages",
            "AIMessage"
          ],
          "kwargs": {
            "content": "「ボッチザ・ロック」は、はまじあきによる日本の4コマ漫画で、そのアニメ化作品のことを指します。物語は、極度の人見知りでギターを愛する高校一年生の後藤ひとり（通称「ぼっちちゃん」）が主人公。彼女は、伊地知虹夏が率いる「結束バンド」に加入し、バンド活動を通じて成長していきます。アニメの初回放送は2022年10月9日でした。",
            "additional_kwargs": {}
          }
        }
      }
    ]
  ],
  "llm_output": {
    "token_usage": {
      "completion_tokens": 169,
      "prompt_tokens": 1118,
      "total_tokens": 1287
    },
    "model_name": "gpt-4",
    "system_fingerprint": null
  },
  "run": null
}
[chain/end] [1:chain:AgentExecutor] [20.48s] Exiting Chain run with output:
{
  "output": "「ボッチザ・ロック」は、はまじあきによる日本の4コマ漫画で、そのアニメ化作品のことを指します。物語は、極度の人見知りでギターを愛する高校一年生の後藤ひとり（通称「ぼっちちゃん」）が主人公。彼女は、伊地知虹夏が率いる「結束バンド」に加入し、バンド活動を通じて成長していきます。アニメの初回放送は2022年10月9日でした。"
}

複数toolを持つAgent

リンクアンドモチベーションのウィキペディアページと、LLMの自己学習に関する論文をデータソースとしてRAGを行うAgentの実装は下記

実装詳細



# 2. 複数Documentを使うagent

from langchain.document_loaders import TextLoader
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.document_loaders import ArxivLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.document_loaders import UnstructuredURLLoader
from langchain.agents.agent_toolkits import create_retriever_tool
embeddings = OpenAIEmbeddings()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)

# DeepMindの研究者らが有効性を検証した、LLMに自ら高品質な訓練データを生成させる「自己学習」
docs1 = ArxivLoader(query="2312.06585", load_max_docs=2).load()
print(len(docs1))
display(docs1[0].metadata)
texts1 = text_splitter.split_documents(docs1)
db1 = FAISS.from_documents(texts1, embeddings)
retriever1 = db1.as_retriever()

urls=['https://ja.wikipedia.org/wiki/%E3%83%AA%E3%83%B3%E3%82%AF%E3%82%A2%E3%83%B3%E3%83%89%E3%83%A2%E3%83%81%E3%83%99%E3%83%BC%E3%82%B7%E3%83%A7%E3%83%B3']
loader = UnstructuredURLLoader(urls=urls)
docs2 = loader.load()
print(len(docs2))
display(docs2)
texts2 = text_splitter.split_documents(docs2)
db2 = FAISS.from_documents(texts2, embeddings)
retriever2 = db2.as_retriever()

tool_paper = create_retriever_tool(
    retriever = retriever1,
    name = 'paper_about_llm_self_training',
    description = '''
「Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models」という題名の論文の内容に関して有用。
特にLLMの自己学習(self training)に付いて知りたい場合に使う。''',
)

tool_lm = create_retriever_tool(
    retriever = retriever2,
    name = 'paper_about_link_and_motivation',
    description = '''リンクアンドモチベーションという会社に関する情報について調べる際に有用''',
)

tools = [tool_paper, tool_lm]

from langchain.agents.agent_toolkits import create_conversational_retrieval_agent
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(temperature=0, model='gpt-4-1106-preview')
agent_executor = create_conversational_retrieval_agent(llm, tools, verbose=True)

実行


result = agent_executor({"input": "リンクアンドモチベーションってどういう会社?"})
result['output']

result = agent_executor({"input": "LLMの自己学習について日本語で教えて"})
result['output']

実行結果①



リンクアンドモチベーションは、東京都中央区に本社を置く経営コンサルティング会社です。2000年3月27日に設立され、東証プライムに上場しています（証券コード2170）。経営に関するコンサルティング業を主な事業内容としており、代表者は小笹芳央（代表取締役会長）と坂下英樹（代表取締役社長）です。

この会社は、経営学、社会システム論、行動経済学、心理学を統合して考案された「モチベーションエンジニアリング」という独自の基幹技術を開発し、それを中心に様々なサービスを提供しています。事業は大きく分けて、組織を対象にした「組織開発Division」、個人を対象にした「個人開発Division」、組織と個人をマッチングする「マッチングDivision」、そして「ベンチャーインキュベーション事業」で構成されています。

「組織開発Division」では、コンサル・クラウド事業やイベント・メディア事業を通じて、社員のモチベーションを成長エンジンとする企業を支援しています。一方、「個人開発Division」では、キャリアスクール事業や学習塾事業を展開し、自立的・主体的にキャリアを切り拓く人材の育成を行っています。

また、2016年には国内初の組織改善クラウドサービス「モチベーションクラウド」をリリースし、人工知能（AI）を活用した組織データの有効活用を行っています。

社名の「リンクアンドモチベーション」には、事業の中心テーマである「モチベーション」と、個々人のモチベーションを組織の全体戦略に「リンク」させるという意味が込められています。会社のロゴは、「L」と「M」を表し、「One for All, All for One」の思想を表現しています。

詳細な情報は、リンクアンドモチベーションの公式ウェブサイト（[https://www.lmi.ne.jp/](https://www.lmi.ne.jp/)）で確認できます。

実行結果②


LLM（Large Language Model、大規模言語モデル）の自己学習について説明します。

自己学習は、人間が生成したデータに依存せずに、LLMの性能を向上させるための手法です。この手法では、モデル自身が生成したデータを使用して、モデルをさらに訓練します。これにより、モデルはより多様なシナリオや問題に対応できるようになります。

「Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models」という論文では、特に数学問題の正誤を判断するようなスカラーのフィードバックが利用可能なタスクにおいて、LLMの自己学習をどのように行うかについて提案しています。この手法はReST^EM（Reinforced Self-Training with Expectation-Maximization）と呼ばれており、以下の3つのステップで構成されています。

1. モデルからサンプルを生成し、バイナリフィードバックを使用してフィルタリングします。
2. これらのフィルタリングされたサンプルでモデルをファインチューニングします。
3. このプロセスを数回繰り返します。

この方法をPaLM-2モデルを使用して、高度な数学推論やプログラミングのベンチマークでテストした結果、ReST^EMはモデルサイズに応じて有利にスケールし、人間のデータのみにファインチューニングするよりも大幅に性能が向上することが示されました。

この論文の主な発見は、フィードバックを用いた自己学習が、特にスカラーフィードバックから利益を得ることができる複雑な問題解決タスクにおいて、言語モデルの能力を強化するための有望なアプローチであることを示しています。ReST^EM手法により、モデル自体から合成データを生成し、そのデータを使用してモデルをファインチューニングすることができます。これにより、人間が生成したデータの量や多様性に制限されることなく、性能の向上が期待できます。

特徴

簡単なアプリケーションを作る分には結構使いやすい(個人的感想)
- 頑張ってコード読めば結構分かりやすい
とにかく進化(OpenAIの新機能リリースに対する追随等)が速い
注意点としては上記の裏返しで進化が速すぎるので、プロダクションレベルで何も考えずに使ってそのままにしていると、いつの間にか使えなくなってるなど技術負債になり得る
- こちらのABEJAさんのテックブログでもそのあたりに触れられていたと思います
実験的な実装群をまとめたlangchain experimental, 実用的なプロンプト、チェーン、エージェントなどを多くまとめているLangchainHubなど周辺領域のサポートがありLlamaindex同様に拡張を色々試しやすい

Autogen

AutoGenは、Microsoftが提供していてタスクを解決するために互いに会話できる複数のエージェントを使用して、LLMアプリケーションの開発を可能にするフレームワークです。
AutoGenエージェントは、カスタマイズ可能で、会話可能で、シームレスに人間の参加を可能にします。LLM、人間の入力、ツールを組み合わせた様々なモードで動作することができます。
下記の画像のように、複数のAgentを複雑に組み合わせたAgentを組みやすいことが特徴です

※autogenのgithubより転記

RAG

複数toolを持つAgent

人間と、プロダクトマネージャー役のAgentと、コードを書くエンジニア役のAgentを組み合わせて、必要に応じてコードを書かせるAgentの実装は下記の通り

実装詳細



import autogen

config_list = [
    {
        'model': 'gpt-4',
        'api_key': f"{os.environ['OPENAI_API_KEY']}",
    },
]

llm_config = {"config_list": config_list, "cache_seed": 42}
user_proxy = autogen.UserProxyAgent(
   name="User_proxy",
   system_message="A human admin.",
   code_execution_config={"last_n_messages": 2, "work_dir": "groupchat"},
   human_input_mode="TERMINATE"
)
coder = autogen.AssistantAgent(
    name="Coder",
    llm_config=llm_config,
)
pm = autogen.AssistantAgent(
    name="Product_manager",
    system_message="Creative in software product ideas.",
    llm_config=llm_config,
)
groupchat = autogen.GroupChat(agents=[user_proxy, coder, pm], messages=[], max_round=12)
manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)

実行(arXivでGPT-4に関する最新の論文を探し、ソフトウェアでのその潜在的な応用を調べる)


user_proxy.initiate_chat(manager, message="Find a latest paper about gpt-4 on arxiv and find its potential applications in software.")

実行結果詳細


User_proxy (to chat_manager):

Find a latest paper about gpt-4 on arxiv and find its potential applications in software.

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
User_proxy (to chat_manager):



--------------------------------------------------------------------------------
Coder (to chat_manager):

To find a latest paper about GPT-4 on Arxiv, we can use Arxiv's API to search for articles. The details of a paper-like authors, subjects, summary, etc can be found using their API. This code uses Python programming language and you need to install 'feedparser' package to run this code. You can install the 'feedparser' package by running 'pip install feedparser' command in your shell.

Here's the Python code to download and print the title, summary, and url of the most recent articles on gpt-4 from Arxiv.
```python
# Python
import feedparser

def search_papers(search_query = "gpt-4", max_results=10):
    base_url = 'http://export.arxiv.org/api/query'
    query = '?search_query=all:{}&start=0&max_results={}'.format(search_query, max_results)
    url = base_url + query
    response = feedparser.parse(url)
    for entry in response.entries:
        print('Title: ', entry.title)
        print('Link: ', entry.link)
        print('Summary: ', entry.summary)
        print('-----------------------------------------')

search_papers()

Regarding the potential applications of GPT-4 in software, the exact applications can be obtained by reading through the full paper and identifying relevant points. Considering the functionality improvement pattern from previous versions (GPT-1, GPT-2, and GPT-3), it can be inferred that GPT-4 will be more powerful in terms of understanding context, generating human-like text, and extrapolating from given information. These capabilities will likely facilitate numerous applications in software development such as:

Code Autocompletion: GPT-4 could be used to suggest code completions to developers as they type, potentially improving productivity.
Code Bug Detection: By understanding the context better, GPT-4 can potentially help detect bugs in the code that other tools may miss.
Software Documentation: GPT-4 could be used to generate documentation for software based on the code and comments.
Natural Language Interface: GPT-4 can be used to develop a more sophisticated natural language interface for software applications.
Improved Chatbots: GPT-4 can make chatbots more conversational and understand the user's intent better.

This is speculative, as without more details about GPT-4's capabilities from the recent paper, it's hard to provide a more precise answer.

Please run the above python code to fetch the latest papers. After reading the latest papers, we can provide more precise potential applications in the software field.

USING AUTO REPLY...

EXECUTING CODE BLOCK 0 (inferred language is python)...
WARNING:autogen.code_utils:execute_code was called without specifying a value for use_docker. Since the python docker package is not available, code will be run natively. Note: this fallback behavior is subject to change
User_proxy (to chat_manager):

exitcode: 1 (execution failed)
Code output:
Traceback (most recent call last):
File "", line 2, in
import feedparser
ModuleNotFoundError: No module named 'feedparser'

Coder (to chat_manager):

It seems the 'feedparser' module isn't installed in your Python environment. You can install it by running the command pip install feedparser.
Here is the command:

pip install feedparser

After running this command, please execute the initial Python code block again to find the most recent articles about GPT-4 from Arxiv.

USING AUTO REPLY...

EXECUTING CODE BLOCK 0 (inferred language is sh)...
WARNING:autogen.code_utils:execute_code was called without specifying a value for use_docker. Since the python docker package is not available, code will be run natively. Note: this fallback behavior is subject to change
User_proxy (to chat_manager):

exitcode: 0 (execution succeeded)
Code output:
Collecting feedparser
Downloading feedparser-6.0.11-py3-none-any.whl (81 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 81.3/81.3 kB 1.9 MB/s eta 0:00:00
Collecting sgmllib3k (from feedparser)
Downloading sgmllib3k-1.0.0.tar.gz (5.8 kB)
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: sgmllib3k
Building wheel for sgmllib3k (setup.py): started
Building wheel for sgmllib3k (setup.py): finished with status 'done'
Created wheel for sgmllib3k: filename=sgmllib3k-1.0.0-py3-none-any.whl size=6048 sha256=e93bac0207547be94677310dfd7bfe5432c55f104966d0282b8842dba2c81b16
Stored in directory: /root/.cache/pip/wheels/f0/69/93/a47e9d621be168e9e33c7ce60524393c0b92ae83cf6c6e89c5
Successfully built sgmllib3k
Installing collected packages: sgmllib3k, feedparser
Successfully installed feedparser-6.0.11 sgmllib3k-1.0.0

USING AUTO REPLY...

EXECUTING CODE BLOCK 0 (inferred language is sh)...
WARNING:autogen.code_utils:execute_code was called without specifying a value for use_docker. Since the python docker package is not available, code will be run natively. Note: this fallback behavior is subject to change
User_proxy (to chat_manager):

exitcode: 0 (execution succeeded)
Code output:
Requirement already satisfied: feedparser in /usr/local/lib/python3.10/dist-packages (6.0.11)
Requirement already satisfied: sgmllib3k in /usr/local/lib/python3.10/dist-packages (from feedparser) (1.0.0)

USING AUTO REPLY...
User_proxy (to chat_manager):

Coder (to chat_manager):

Great! Now that the 'feedparser' module is installed, let's once again run the initial Python code block to find the most recent articles about GPT-4 from Arxiv. Here is the Python code:

# Python
import feedparser

def search_papers(search_query = "gpt-4", max_results=10):
    base_url = 'http://export.arxiv.org/api/query'
    query = '?search_query=all:{}&start=0&max_results={}'.format(search_query, max_results)
    url = base_url + query
    response = feedparser.parse(url)
    for entry in response.entries:
        print('Title: ', entry.title)
        print('Link: ', entry.link)
        print('Summary: ', entry.summary)
        print('-----------------------------------------')

search_papers()

This script will print out the title, link, and a summary of the 10 most recent articles about "gpt-4" on arxiv.

USING AUTO REPLY...

EXECUTING CODE BLOCK 0 (inferred language is python)...
WARNING:autogen.code_utils:execute_code was called without specifying a value for use_docker. Since the python docker package is not available, code will be run natively. Note: this fallback behavior is subject to change
User_proxy (to chat_manager):

exitcode: 0 (execution succeeded)
Code output:
Title: Can LLMs like GPT-4 outperform traditional AI tools in dementia
diagnosis? Maybe, but not today
Link: http://arxiv.org/abs/2306.01499v1
Summary: Recent investigations show that large language models (LLMs), specifically
GPT-4, not only have remarkable capabilities in common Natural Language
Processing (NLP) tasks but also exhibit human-level performance on various
professional and academic benchmarks. However, whether GPT-4 can be directly
used in practical applications and replace traditional artificial intelligence
(AI) tools in specialized domains requires further experimental validation. In
this paper, we explore the potential of LLMs such as GPT-4 to outperform
traditional AI tools in dementia diagnosis. Comprehensive comparisons between
GPT-4 and traditional AI tools are conducted to examine their diagnostic
accuracy in a clinical setting. Experimental results on two real clinical
datasets show that, although LLMs like GPT-4 demonstrate potential for future
advancements in dementia diagnosis, they currently do not surpass the
performance of traditional AI tools. The interpretability and faithfulness of
GPT-4 are also evaluated by comparison with real doctors. We discuss the
limitations of GPT-4 in its current state and propose future research
directions to enhance GPT-4 in dementia diagnosis.

Title: GPT-4 Can't Reason
Link: http://arxiv.org/abs/2308.03762v2
Summary: GPT-4 was released in March 2023 to wide acclaim, marking a very substantial
improvement across the board over GPT-3.5 (OpenAI's previously best model,
which had powered the initial release of ChatGPT). However, despite the
genuinely impressive improvement, there are good reasons to be highly skeptical
of GPT-4's ability to reason. This position paper discusses the nature of
reasoning; criticizes the current formulation of reasoning problems in the NLP
community, as well as the way in which LLM reasoning performance is currently
evaluated; introduces a small collection of 21 diverse reasoning problems; and
performs a detailed qualitative evaluation of GPT-4's performance on those
problems. Based on this analysis, the paper concludes that, despite its
occasional flashes of analytical brilliance, GPT-4 at present is utterly
incapable of reasoning.

Title: Question-Answering Approach to Evaluate Legal Summaries
Link: http://arxiv.org/abs/2309.15016v1
Summary: Traditional evaluation metrics like ROUGE compare lexical overlap between the
reference and generated summaries without taking argumentative structure into
account, which is important for legal summaries. In this paper, we propose a
novel legal summarization evaluation framework that utilizes GPT-4 to generate
a set of question-answer pairs that cover main points and information in the
reference summary. GPT-4 is then used to generate answers based on the
generated summary for the questions from the reference summary. Finally, GPT-4
grades the answers from the reference summary and the generated summary. We
examined the correlation between GPT-4 grading with human grading. The results
suggest that this question-answering approach with GPT-4 can be a useful tool
for gauging the quality of the summary.

Title: Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4
Link: http://arxiv.org/abs/2304.03439v3
Summary: Harnessing logical reasoning ability is a comprehensive natural language
understanding endeavor. With the release of Generative Pretrained Transformer 4
(GPT-4), highlighted as "advanced" at reasoning tasks, we are eager to learn
the GPT-4 performance on various logical reasoning tasks. This report analyses
multiple logical reasoning datasets, with popular benchmarks like LogiQA and
ReClor, and newly-released datasets like AR-LSAT. We test the multi-choice
reading comprehension and natural language inference tasks with benchmarks
requiring logical reasoning. We further construct a logical reasoning
out-of-distribution dataset to investigate the robustness of ChatGPT and GPT-4.
We also make a performance comparison between ChatGPT and GPT-4. Experiment
results show that ChatGPT performs significantly better than the RoBERTa
fine-tuning method on most logical reasoning benchmarks. With early access to
the GPT-4 API we are able to conduct intense experiments on the GPT-4 model.
The results show GPT-4 yields even higher performance on most logical reasoning
datasets. Among benchmarks, ChatGPT and GPT-4 do relatively well on well-known
datasets like LogiQA and ReClor. However, the performance drops significantly
when handling newly released and out-of-distribution datasets. Logical
reasoning remains challenging for ChatGPT and GPT-4, especially on
out-of-distribution and natural language inference datasets. We release the
prompt-style logical reasoning datasets as a benchmark suite and name it
LogiEval.

Title: How is ChatGPT's behavior changing over time?
Link: http://arxiv.org/abs/2307.09009v3
Summary: GPT-3.5 and GPT-4 are the two most widely used large language model (LLM)
services. However, when and how these models are updated over time is opaque.
Here, we evaluate the March 2023 and June 2023 versions of GPT-3.5 and GPT-4 on
several diverse tasks: 1) math problems, 2) sensitive/dangerous questions, 3)
opinion surveys, 4) multi-hop knowledge-intensive questions, 5) generating
code, 6) US Medical License tests, and 7) visual reasoning. We find that the
performance and behavior of both GPT-3.5 and GPT-4 can vary greatly over time.
For example, GPT-4 (March 2023) was reasonable at identifying prime vs.
composite numbers (84% accuracy) but GPT-4 (June 2023) was poor on these same
questions (51% accuracy). This is partly explained by a drop in GPT-4's amenity
to follow chain-of-thought prompting. Interestingly, GPT-3.5 was much better in
June than in March in this task. GPT-4 became less willing to answer sensitive
questions and opinion survey questions in June than in March. GPT-4 performed
better at multi-hop questions in June than in March, while GPT-3.5's
performance dropped on this task. Both GPT-4 and GPT-3.5 had more formatting
mistakes in code generation in June than in March. We provide evidence that
GPT-4's ability to follow user instructions has decreased over time, which is
one common factor behind the many behavior drifts. Overall, our findings show
that the behavior of the "same" LLM service can change substantially in a
relatively short amount of time, highlighting the need for continuous
monitoring of LLMs.

Title: Gpt-4: A Review on Advancements and Opportunities in Natural Language
Processing
Link: http://arxiv.org/abs/2305.03195v1
Summary: Generative Pre-trained Transformer 4 (GPT-4) is the fourth-generation
language model in the GPT series, developed by OpenAI, which promises
significant advancements in the field of natural language processing (NLP). In
this research article, we have discussed the features of GPT-4, its potential
applications, and the challenges that it might face. We have also compared
GPT-4 with its predecessor, GPT-3. GPT-4 has a larger model size (more than one
trillion), better multilingual capabilities, improved contextual understanding,
and reasoning capabilities than GPT-3. Some of the potential applications of
GPT-4 include chatbots, personal assistants, language translation, text
summarization, and question-answering. However, GPT-4 poses several challenges
and limitations such as computational requirements, data requirements, and
ethical concerns.

Title: Is GPT-4 a Good Data Analyst?
Link: http://arxiv.org/abs/2305.15038v2
Summary: As large language models (LLMs) have demonstrated their powerful capabilities
in plenty of domains and tasks, including context understanding, code
generation, language generation, data storytelling, etc., many data analysts
may raise concerns if their jobs will be replaced by artificial intelligence
(AI). This controversial topic has drawn great attention in public. However, we
are still at a stage of divergent opinions without any definitive conclusion.
Motivated by this, we raise the research question of "is GPT-4 a good data
analyst?" in this work and aim to answer it by conducting head-to-head
comparative studies. In detail, we regard GPT-4 as a data analyst to perform
end-to-end data analysis with databases from a wide range of domains. We
propose a framework to tackle the problems by carefully designing the prompts
for GPT-4 to conduct experiments. We also design several task-specific
evaluation metrics to systematically compare the performance between several
professional human data analysts and GPT-4. Experimental results show that
GPT-4 can achieve comparable performance to humans. We also provide in-depth
discussions about our results to shed light on further studies before reaching
the conclusion that GPT-4 can replace data analysts.

Title: Graph Neural Architecture Search with GPT-4
Link: http://arxiv.org/abs/2310.01436v1
Summary: Graph Neural Architecture Search (GNAS) has shown promising results in
automatically designing graph neural networks. However, GNAS still requires
intensive human labor with rich domain knowledge to design the search space and
search strategy. In this paper, we integrate GPT-4 into GNAS and propose a new
GPT-4 based Graph Neural Architecture Search method (GPT4GNAS for short). The
basic idea of our method is to design a new class of prompts for GPT-4 to guide
GPT-4 toward the generative task of graph neural architectures. The prompts
consist of descriptions of the search space, search strategy, and search
feedback of GNAS. By iteratively running GPT-4 with the prompts, GPT4GNAS
generates more accurate graph neural networks with fast convergence.
Experimental results show that embedding GPT-4 into GNAS outperforms the
state-of-the-art GNAS methods.

Title: Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with
Code-based Self-Verification
Link: http://arxiv.org/abs/2308.07921v1
Summary: Recent progress in large language models (LLMs) like GPT-4 and PaLM-2 has
brought significant advancements in addressing math reasoning problems. In
particular, OpenAI's latest version of GPT-4, known as GPT-4 Code Interpreter,
shows remarkable performance on challenging math datasets. In this paper, we
explore the effect of code on enhancing LLMs' reasoning capability by
introducing different constraints on the \textit{Code Usage Frequency} of GPT-4
Code Interpreter. We found that its success can be largely attributed to its
powerful skills in generating and executing code, evaluating the output of code
execution, and rectifying its solution when receiving unreasonable outputs.
Based on this insight, we propose a novel and effective prompting method,
explicit \uline{c}ode-based \uline{s}elf-\uline{v}erification~(CSV), to further
boost the mathematical reasoning potential of GPT-4 Code Interpreter. This
method employs a zero-shot prompt on GPT-4 Code Interpreter to encourage it to
use code to self-verify its answers. In instances where the verification state
registers as ``False'', the model shall automatically amend its solution,
analogous to our approach of rectifying errors during a mathematics
examination. Furthermore, we recognize that the states of the verification
result indicate the confidence of a solution, which can improve the
effectiveness of majority voting. With GPT-4 Code Interpreter and CSV, we
achieve an impressive zero-shot accuracy on MATH dataset \textbf{(53.9% $\to$
84.3%)}.

Title: OpenAI Cribbed Our Tax Example, But Can GPT-4 Really Do Tax?
Link: http://arxiv.org/abs/2309.09992v1
Summary: The authors explain where OpenAI got the tax law example in its livestream
demonstration of GPT-4, why GPT-4 got the wrong answer, and how it fails to
reliably calculate taxes.

Product_manager (to chat_manager):

Based on the titles and summaries of the recent arxiv papers, areas of applications for GPT-4 in software might include:

Medical Diagnostics Software: Paper titled "Can LLMs like GPT-4 outperform traditional AI tools in dementia diagnosis?" suggests the potential application of GPT-4 in building diagnostic software tools.
Automated Code Generation: Based on the paper titled "How is ChatGPT's behavior changing over time?", GPT-4 might be used for generating code, though with varying performance over time.
Automating Data Analysis: The research paper titled "Is GPT-4 a Good Data Analyst?" suggests GPT-4's potential to automate tasks of data analysts.
Natural Language Interface for Legal Applications: The paper titled "Question-Answering Approach to Evaluate Legal Summaries" uses GPT-4 for generating and grading question-answer pairs for evaluating legal summaries, suggesting possible applications in legal software products requiring natural language interface or summarization capabilities.
Automatic design of Graph Neural Networks (GNN): The paper "Graph Neural Architecture Search with GPT-4" explores the use of GPT-4 in designing neural network architectures, hence can be used in software which depends upon GNN for its operations.

These are only a few potential applications and the actual applications of GPT-4 could be diverse and extensive given the capabilities of the model. To have a more precise list of potential applications in software, a thorough reading of each paper would be required.

USING AUTO REPLY...
User_proxy (to chat_manager):



Based on the titles and summaries of the recent arxiv papers, areas of applications for GPT-4 in software might include:

1. **Medical Diagnostics Software**: Paper titled "Can LLMs like GPT-4 outperform traditional AI tools in dementia diagnosis?" suggests the potential application of GPT-4 in building diagnostic software tools.

2. **Automated Code Generation**: Based on the paper titled "How is ChatGPT's behavior changing over time?", GPT-4 might be used for generating code, though with varying performance over time.

3. **Automating Data Analysis**: The research paper titled "Is GPT-4 a Good Data Analyst?" suggests GPT-4's potential to automate tasks of data analysts.

4. **Natural Language Interface for Legal Applications**: The paper titled "Question-Answering Approach to Evaluate Legal Summaries" uses GPT-4 for generating and grading question-answer pairs for evaluating legal summaries, suggesting possible applications in legal software products requiring natural language interface or summarization capabilities.

5. **Automatic design of Graph Neural Networks (GNN)**: The paper "Graph Neural Architecture Search with GPT-4" explores the use of GPT-4 in designing neural network architectures, hence can be used in software which depends upon GNN for its operations.

These are only a few potential applications and the actual applications of GPT-4 could be diverse and extensive given the capabilities of the model. To have a more precise list of potential applications in software, a thorough reading of each paper would be required.

このGroupChatが実行されたときの処理としては下記の様な流れになります。
1. 人間がuser_proxyに最初の命令を入力
2. user_proxyがGroupChatManagerに人間からの入力をリクエスト
3. GroupChatManagerが話者以外の各Agentにメッセージを配信
4. GroupChatManagerが、各Agentの中で誰に話させるか選択
5. 選択されたAgentが会話を生成、GroupChatManagerがその出力を取得
6. 再度各Agentに配信
7. 上記3〜6を繰り返す

特徴

Human in the loopを入れやすい
- =Agentにある程度自動で動いて欲しいものの、意図せぬ方向に進まないように一定人間が指示を途中で出来るようにしておける
複雑なAgentの組み合わせが実装しやすい
自力で必要なToolを自分で作るなど、複雑なことやりやすい
注意点としてはまだあまり活用例が多くない(個人の感想です)

まとめ

ライブラリの選定は場面に応じて使う必要がありますが、基本的な実装は個人的にはlangchinが最も使いやすいなという印象です。
ただ今後、複雑にAgentの組み合わせた処理を色々やることを考えるとAutogenをうまく使えるようになっていきたいなと思いました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up