LangChainを使いＡＩエージェント(Webサーチ)をELYZA-japanese-Llama-2-7bで構築

Posted at 2025-07-03

LangChainを使いＡＩエージェント(Webサーチ)をELYZA-japanese-Llama-2-7bで構築の仕方

ＡＩエージェントとは？

ワークフローを設計し、利用可能なツールを活用すること、またユーザーに代わってタスクを自動的に実行できるシステムの事。

例えばAIエージェントがウェブサーチを使い、その結果を生成AI（ChatGPT等）に与える事でユーザーが欲しい情報を届ける事が出来る。
そのAIエージェントのライブラリーの１つがLangChain。

ＡＩエージェント（LangChain）が使える生成AI
LangChainがサポートしている生成AIは

OpenAI
Ollama
Anthropic
Google AI
https://python.langchain.com/docs/integrations/llms/

もちろん、HuggingFaceもサポートしてます。
なので、今回はELYZA-japanese-Llama-2-7bを使用したＡＩエージェントの作成をしたいと思います。
（ELYZA-japanese-Llama-2-7bならファインチューンもしやすい為）

必要なライブラリー

langchain-community langchain-huggingface langchain duckduckgo-search

duckduckgo-searchはウェブサーチ用

インポートするライブラリーは下記の通り

from langchain_community.tools import DuckDuckGoSearchResults
from langchain_community.utilities import DuckDuckGoSearchAPIWrapper
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

DuckDuckGoSearchAPIWrapperはサーチ場所を日本にする為（ディフォルトはUSエリア）

wrapper = DuckDuckGoSearchAPIWrapper(region="jp-jp")

これによりサーチエリアが日本に変更。

HuggingFaceからモデルとトークナイザーを読み込み、パイプラインに入れる
量子化はあってもなくてもOK

現在、LangChainを使う場合パイプラインのタスクはtext-generationのみサポートしてます。

hf_auth = HF_TOKENS  # ここにHuggingFaceのトークンを入れる
model_id = 'elyza/ELYZA-japanese-Llama-2-7b-instruct'

# 量子化
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,  # 4bitのQuantizationの有効化
    bnb_4bit_quant_type="nf4",  # 4bitのQuantizationのタイプ (fp4 or nf4)
    bnb_4bit_compute_dtype=torch.float16,  # 4bitのQuantizationのdtype (float16 or bfloat16)
    bnb_4bit_use_double_quant=False,  # 4bitのDouble-Quantizationの有効化
)

model_config = transformers.AutoConfig.from_pretrained(
    model_id,
    token=hf_auth
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    quantization_config=quantization_config,
    device_map='auto',
    token=hf_auth,
    weights_only=True,
)

# Tokenizerの準備
tokenizer = AutoTokenizer.from_pretrained(
    model_id,  # モデル名
    use_fast=False,  # Fastトークナイザーの有効化
    add_eos_token=True,  # データへのEOSの追加を指示
    trust_remote_code=True,
    token=hf_auth,
)

pipe = pipeline("text-generation",
                model=model,
                tokenizer=tokenizer
            )

パイプラインをLangChainに組み込み

hf_llm = HuggingFacePipeline(pipeline=pipe)

検索エンジン用のファンクションの作成

def search_web(query: str) -> str:
    wrapper = DuckDuckGoSearchAPIWrapper(region="jp-jp", time="d", max_results=2)
    engine = DuckDuckGoSearchResults(api_wrapper=wrapper, backend="news")
    return engine.invoke(f"{query}")

確認方法

print(search_web(”トヨタについて”))

プロンプトの作成
検索結果とはsearch_webの結果の事
"[INST]", "[/INST]"がなくても大丈夫です（むしろ、ない方が良いかも）

prompt_template = ChatPromptTemplate([
    ("system", "あなたはAIアシスタントです。必要に応じて、検索結果を使用します。"),
    ("user", "{question}")
])

AIエージェント（チェインの作成）

chain = ({"question": lambda x: x["question"], "検索結果": lambda x: search_web(x["question"])} # <----search_webを使用
         | prompt_template
         | hf_llm
         | StrOutputParser()
         )

LangChainでチェインの作り方は　|　で分割します。
また、オーダーが違うとエラーが出るのでオーダーに気をつけてください。
オーダーは　ウェブ検索　ー＞　プロンプト　ー＞　LLM　ー＞　結果

昔のやり方では（こちらではエラーがでます。）

from langchain.chains import LLMChain
from langchain_core.tools import Tool

tool = Tool(
        name="検索結果",  # Name of the tool
        func=search_web,  # Function that the tool will execute
        # Description of the tool
        description="Webサーチ",
    )
chain = =LLMChain(llm=hf_llm, prompt=prompt_template, tools=[tool])

質問の作成

question = "トヨタとは？"
print(chain.invoke({"question": question}))

>>> 
System: あなたはAIアシスタントです。必要に応じて、検索結果を使用します。
Human: トヨタとは？
トヨタ自動車株式会社の略称である。トヨタ自動車株式会社は、日本を代表する自動車メーカーであり、世界最大手の自動車製造企業の一つである。愛知県を中心に生産、販売を行っている。また、関連会社を含めたトヨタグループは世界最大の自動車販売台数を誇っている。

- トヨタ自動車株式会社
- トヨタ
- トヨタ自動車
- トヨタグループ
- トヨタの歴史
- トヨタの車種
- トヨタの役員
- 関連会社
- トヨタ自動車の工場
- トヨタ自動車の沿革

ちなみにウェブサーチをなくして、ELYZA-japanese-Llama-2-7に聞いたら変な回答が帰ってきました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up