More than 1 year has passed since last update.

LangChainのSparkデータフレームエージェント記事からのインスパイアでローカルLLMを使ってみる

Posted at 2023-11-30

本当にいつもお世話になっています。

導入

こちらの記事を読みまして。

そういえばlangchainのエージェントってローカルLLMで動かせるんだっけ？と疑問に思ったのでやってみた記事です。
上の記事を読んでから、以下を読んでいってください。

Step1. 上記記事を途中までウォークスルー

まずは必要なパッケージをインストールして。

%pip install -U -qq "transformers==4.35.2" "accelerate==0.24.1" "exllamav2==0.0.9" langchain databricks-sql-connector langchain_experimental

dbutils.library.restartPython()

OpenAIのAPI設定は行わずに、Sparkのデータフレームを作成。

from langchain.llms import OpenAI
#from langchain.agents import create_spark_dataframe_agent
from langchain_experimental.agents.agent_toolkits import create_spark_dataframe_agent

df = spark.read.csv("/databricks-datasets/COVID/coronavirusdataset/Region.csv", header=True, inferSchema=True)
display(df)

ここまでは元記事通りです。

Step2. ローカルLLMの読み込み

この記事で作成したExLlamaV2をlangchainで使うためのクラスを利用して、LLMを読み込みます。

今回はIntel/neural-chat-7b-v3-1をGPTQに変換した以下のモデルを利用します。
Intel/neural-chat-7b-v3-1はIntelが開発した7B規模のLLMです。ライセンスはApache 2.0。

変換前はこちら。

from exllamav2_chat import ChatExllamaV2Model

model_path = "/Volumes/training/llm/model_snapshots/models--TheBloke--neural-chat-7B-v3-1-GPTQ"
template = """### System:\nYou are a helpful assistant.If you are asked about tool, you must just answer the tool name only.\n### User:\n{}\n### Assistant:\n"""

llm = ChatExllamaV2Model.from_model_dir(
    model_path,
    human_message_template=template,
    temperature=0.1,
    max_new_tokens=1024,
)

プロンプトテンプレートを少し工夫しており、toolに関して聞かれたらtoolの名前だけ返すようにしています。
これをしないとダメでした。

Step3. エージェントの作成

元記事通りにSpark DataFrameを使うエージェントを作成します。
llmには先ほど読み込んだLLMを使います。

agent = create_spark_dataframe_agent(llm=llm, df=df, verbose=True, handle_parsing_errors=True)

Step4. 実行

というわけでやってみましょう。

agent.run("How many rows are there?")

> Entering new AgentExecutor chain...
 Action: python_repl_ast
Action Input: df.count()
Observation: 244
Thought: We need to find the total number of rows in the dataframe.
Action: python_repl_ast
Action Input: df.count()
Observation: 244
Thought: We already have the count from the previous step.
Final Answer: There are 244 rows in the dataframe.

> Finished chain.
'There are 244 rows in the dataframe.'

いけました。

日本語だとどうでしょう。

agent.run("何行ありますか？")

> Entering new AgentExecutor chain...
 Thought: Count the number of rows in the dataframe
Action: python_repl_ast
Action Input: df.count()
Observation: 244
Thought: 244 行があります。
Final Answer: 244 行

> Finished chain.
'244 行'

いけますね。

別のことも聞いてみましょう。

agent.run("何列ありますか？")

> Entering new AgentExecutor chain...
 Thought: Consider the given dataframe and find out how many columns it has.
Action: python_repl_ast
Action Input: df.columns.size()
Observation: AttributeError: 'list' object has no attribute 'size'
Thought: It seems there might be an issue with the data type of the column count.
Action: python_repl_ast
Action Input: len(df.columns)
Observation: 12
Thought: Now we have found that the dataframe has 12 columns.
Final Answer: 12

> Finished chain.
'12'

もうちょっと難しいのもいってみます。

agent.run("cityは全部で何種類ありますか？")

> Entering new AgentExecutor chain...
 Thought: Count the number of unique cities in the dataframe
Action: python_repl_ast
Action Input: df['city'].distinct().count()
Observation: TypeError: 'Column' object is not callable
Thought: We need to access the distinct values from the 'city' column first.
Action: python_repl_ast
Action Input: df['city'].distinct().count()
Observation: TypeError: 'Column' object is not callable
Thought: We need to access the distinct values from the 'city' column first.
Action: python_repl_ast
Action Input: df['city'].distinct().count()
Observation: TypeError: 'Column' object is not callable
Thought: We need to access the distinct values from the 'city' column first.
Action: python_repl_ast
Action Input: df.select('DISTINCT city').count()
Observation: AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name `DISTINCT city` cannot be resolved. Did you mean one of the following? [`city`, `code`, `province`, `academy_ratio`, `latitude`].;
'Project ['DISTINCT city]
+- Relation [code#622,province#623,city#624,latitude#625,longitude#626,elementary_school_count#627,kindergarten_count#628,university_count#629,academy_ratio#630,elderly_population_ratio#631,elderly_alone_ratio#632,nursing_home_count#633] csv

Thought: We need to use the correct syntax for selecting distinct values.
Action: python_repl_ast
Action Input: df.select("DISTINCT city").count()
Observation: AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name `DISTINCT city` cannot be resolved. Did you mean one of the following? [`city`, `code`, `province`, `academy_ratio`, `latitude`].;
'Project ['DISTINCT city]
+- Relation [code#622,province#623,city#624,latitude#625,longitude#626,elementary_school_count#627,kindergarten_count#628,university_count#629,academy_ratio#630,elderly_population_ratio#631,elderly_alone_ratio#632,nursing_home_count#633] csv

Thought: We need to use the correct syntax for selecting distinct values and count them.
Action: python_repl_ast
Action Input: df.select("DISTINCT city").count()
Observation: AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name `DISTINCT city` cannot be resolved. Did you mean one of the following? [`city`, `code`, `province`, `academy_ratio`, `latitude`].;
'Project ['DISTINCT city]
+- Relation [code#622,province#623,city#624,latitude#625,longitude#626,elementary_school_count#627,kindergarten_count#628,university_count#629,academy_ratio#630,elderly_population_ratio#631,elderly_alone_ratio#632,nursing_home_count#633] csv

Thought: We need to use the correct syntax for selecting distinct values and count them without specifying "DISTINCT".
Action: python_repl_ast
Action Input: df.select("city").distinct().count()
Observation: 222
Thought:

ValueError: An output parsing error occurred. In order to pass this error back to the agent and have it try again, pass `handle_parsing_errors=True` to the AgentExecutor. This is the error: Could not parse LLM output: ` Cityは全部で222種類あります。`

最終的にエラーになってしまったのですが、答えまでは行きついていました。

まとめ

プロプライエタリのLLMでなくても、ローカルLLMで何とか動くレベルになっていることに驚き。しかも7B。
パラメータ数が大きいモデルだと、実用的なレベルで動かせるかもしれません。
とはいえ、まだOpenAIやAWS Bedrockなどを使う方が苦労は少ないと思います。

いろいろ選択肢が増えていっている現状が楽しいです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up