scikit-learn とのアナロジーから見る LangChain Expression Language (LCEL)

Last updated at 2024-02-22Posted at 2024-02-22

概要

本記事では、LangChain Expression Language（LCEL）を紹介し、その基本概念を scikit-learn とのアナロジーに触れて理解を試みます。

本記事で提供されているコードは、特定の環境や条件下での動作を示すものであり、全ての環境やケースで同様に機能するとは限りません。また、時間の経過とともにソフトウェアの更新や互換性の問題が生じる可能性があるため、掲載されているコードが最新の状態であるとは限りません。本コードの使用は、読者の責任において行ってください。実行する前に、コードをよく理解し、必要に応じて環境に適合させることを推奨します。また、可能な限りバックアップを取ることを忘れないでください。本記事の内容やコードに関する質問や改善提案があれば、コメント欄やソーシャルメディアを通じてお知らせください。

はじめに

LLM が流行って一年程度。技術の進歩の早さに焦りを感じながらもインプットしたものを気が向くままにアウトプットしていきます。

さて本記事は LCEL について。
LCEL とは何ぞやという方も多いかもしれませんが、LCEL とは LangChain Expression Language の略で langchain¹ と呼ばれる言語モデルを活用したアプリケーション開発のためのフレームワークの宣言型の実装方法の１つです。langchain を使えば OpenAI や Gemini など一般的によく知られている LLM などを使って RAG やエージェント作成を簡単に行うことができます。そして、LCEL は

LCEL was designed from day 1 to support putting prototypes in production, with no code changes, from the simplest “prompt + LLM” chain to the most complex chains²

にもあるように langchain による基本コンポーネントから複雑なチェーンまで簡単に実装できるよう提供されています。

特に本書ではこの LCEL について理解を深めるべく基本的なコンポーネントについて scikit-learn とのアナロジーに触れます。

本書の対象読者

langchain を触ったことあるけど LCEL よくわからない
langchain を触ったことなく、これから触りたい

本書の実行環境

本書の Python コードの実行例などは下記の環境に基づいています：

Windows 11
Python 3.10.11
langchain==0.1.8
langchain-openai==0.0.6
python-dotenv==1.0.1
gpt-3.5-turbo (2024/02/21)

実行する際には OpenAI の API キーを取得の上、.envファイルを下記の内容で作成してください：

OPENAI_API_KEY={YOUR_API_KEY}

OpenAI API の利用には費用が発生します。詳細は https://openai.com/pricing をご確認ください。

LCEL の使い方

LCEL の基本的な使い方は Runnable オブジェクトを | で繋げることです。

たとえば、

from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
load_dotenv()

prompt = ChatPromptTemplate.from_template("Explain {topic}")
llm = ChatOpenAI(model='gpt-3.5-turbo')
output_parser = StrOutputParser()

chain = prompt | llm | output_parser

のようにします。そして、この chain をどのようにして呼び出すのかというと、chain.invoke({"topic": "python"}) のようにして呼び出します：

Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from dotenv import load_dotenv
>>> from langchain_openai import ChatOpenAI
>>> from langchain_core.output_parsers import StrOutputParser
>>> from langchain_core.prompts import ChatPromptTemplate
>>> load_dotenv()
True
>>> prompt = ChatPromptTemplate.from_template("Explain {topic}")
>>> llm = ChatOpenAI(model='gpt-3.5-turbo')
>>> output_parser = StrOutputParser()
>>> chain = prompt | llm | output_parser
>>> chain.invoke({"topic": "python"})
"Python is a high-level, interpreted programming language that is known for its simplicity and readability. It was created by Guido van Rossum in the late 1980s and has since become one of the most popular programming languages in the world.\n\nPython is versatile and can be used for a wide range of applications, including web development, data analysis, artificial intelligence, and scientific computing. It has a large standard library that provides support for common tasks and functions, as well as a vibrant community of developers who contribute to the language's ecosystem.\n\nOne of the key features of Python is its emphasis on readability and simplicity, which makes it a great language for beginners to learn. Python uses indentation to define code blocks, rather than curly braces or keywords, which helps to reduce the amount of syntax required and makes the code easier to understand.\n\nOverall, Python is a powerful and flexible programming language that is widely used in various industries and has a strong community of developers who continue to support and enhance its capabilities."

これは、

prompt_ = prompt.format(topic='python')
response = llm.invoke(prompt_)
output_parser.invoke(response)

と同じで、prompt | llm | output_parser とすることで必要な一連の Runnable オブジェクトを「チェーン」のように繋げることができます（どれくらい短くできるかは²を参照してください）

そして、このチェーンではインターフェースとして

stream: レスポンスのチャンクをストリームバックする
invoke: 入力に対してチェーンを呼び出す
batch: 入力リストに対してチェーンを呼び出す

を使うことができ、

astream: 非同期にレスポンスのチャンクをストリームバックする
ainvoke: 非同期に入力に対してチェーンを呼び出す
abatch: 非同期で入力リストに対してチェーンを呼び出す
astream_log: 最終的なレスポンスに加えて、中間ステップをストリームバックする

なども行うことができます。

詳しくは下記の記事³ で説明してくださっていたので割愛させていただきます（大変参考にさせていただきました。ありがとうございます...！）。

さて、Runnable オブジェクトでは直列にチェーンを作るだけではなく、複雑なチェーンを作るために必要不可欠な並列化や条件分岐などを表すクラスが用意されています：

setting

from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableBranch, RunnableLambda, RunnablePassthrough, RunnableParallel
load_dotenv()
prompt1 = ChatPromptTemplate.from_template("Explain {topic}")
prompt2 = ChatPromptTemplate.from_template("What kind of {topic}")
prompt3 = ChatPromptTemplate.from_template("What does this sentence explain?\n{explanation}")
llm = ChatOpenAI(model='gpt-3.5-turbo')

RunnableParallel: チェーンを並列化する

Without RunnableParallel With RunnableParallel

Without `RunnableParallel`	With `RunnableParallel`
`p1 = prompt1.format(topic="python") p2 = prompt2.format(topic="python") res1 = llm.invoke(p1) res2 = llm.invoke(p2) {"res1": res1, "res2": res2}`	`chain = RunnableParallel( res1=prompt1\|llm, res2=prompt2\|llm, ) chain.invoke({"topic": "python"})`


p1 = prompt1.format(topic="python")
p2 = prompt2.format(topic="python")
res1 = llm.invoke(p1)
res2 = llm.invoke(p2)
{"res1": res1, "res2": res2}


chain = RunnableParallel(
    res1=prompt1|llm,
    res2=prompt2|llm,
)
chain.invoke({"topic": "python"})

RunnablePassthorough: 入力を変化させず次のステップに渡す

Without `RunnablePassthorough`	With `RunnablePassthorough`
`p1 = prompt1.format(topic="python") llm.invoke(p1)`	`chain = ( RunnablePassthrough() \| prompt1 \| llm ) chain.invoke({"topic": "python"})`

これだけではあまりメリットは見えませんが、並列化などと組み合わせて本領を発揮します。

RunnableLambda: 入力に関数を適用し、次のステップに渡す

Without RunnableLambda With RunnableLambda

Without `RunnableLambda`	With `RunnableLambda`
`p1 = prompt1.format(topic="python") res1 = llm.invoke(p1) p3 = prompt3.format(explanation=res1.content) llm.invoke(p3)`	`chain = ( prompt1 \| llm \| {"explanation": RunnableLambda(lambda x: x.content)} \| prompt3 \| llm ) chain.invoke({"topic": "python"})`


p1 = prompt1.format(topic="python")
res1 = llm.invoke(p1)
p3 = prompt3.format(explanation=res1.content)
llm.invoke(p3)


chain = (
    prompt1
    | llm
    | {"explanation": RunnableLambda(lambda x: x.content)}
    | prompt3
    | llm
)
chain.invoke({"topic": "python"})

RunnableBranch: 入力に対して条件分岐を適用し、成立した条件に応じて次のステップを実行する

Without RunnableBranch With RunnableBranch

Without `RunnableBranch`	With `RunnableBranch`
`human_input = "python" if " " not in human_input: p1 = prompt1.format( topic=human_input ) else: p1 = prompt3.format( explanation=human_input ) llm.invoke(p1)`	`chain = RunnableBranch( ( lambda x: " " not in x['topic'], prompt1, ), ( lambda x: " " in x['topic'], {'explanation': lambda x: x['topic']} \|prompt3 ), prompt1 # Default ) \| llm chain.invoke({"topic": "python"})`


human_input = "python"
if " " not in human_input:
    p1 = prompt1.format(
        topic=human_input
    )
else:
    p1 = prompt3.format(
        explanation=human_input
    )

llm.invoke(p1)


chain = RunnableBranch(
    (
        lambda x: " " not in x['topic'],
        prompt1,
    ),
    (
        lambda x: " " in x['topic'],
        {'explanation': lambda x: x['topic']}
        |prompt3
    ),
    prompt1  # Default
) | llm
chain.invoke({"topic": "python"})

（簡単な例示のはずでしたが、少し見にくくなってしまいました...）

これらのクラスを使えばでかなり複雑なチェーンも組むことができるというわけです：

Setting

setting

prompt1 = ChatPromptTemplate.from_template("What can you read from this statement?\n{human_input}")
prompt2 = ChatPromptTemplate.from_template("S1 has beeen extracted from S2. Is there any contradictions[Yes/No]?\nS1:{extracted}\nS2:{human_input}")
llm = ChatOpenAI(model='gpt-3.5-turbo')

Without LCEL

human_input = "Python is a programming language."
prompt1_ = prompt1.format(human_input=human_input)
extracted = llm.invoke(prompt1_)
prompt2_ = prompt2.format(human_input=human_input, extracted=extracted)
answer = llm.invoke(prompt2_)
answer.content.lower().strip()=='no'

With LCEL

chain = (
    {'human_input': RunnablePassthrough()}
    | RunnableParallel(
        human_input=lambda x: x['human_input'],
        extracted=prompt1 | llm
    )
    | prompt2
    | llm
    | RunnableLambda(lambda x: x.content.lower().strip()=='no')
)
chain.invoke("Python is a programming language.")

このようにコードがすっきり見やすくなります。

scikit-learn とのアナロジー

さて、ここまで LCEL の基本的な使い方を眺めてきました（わかりやすいかどうか怪しいですが...）。そこで本節ではわかりにくかったときのために表題である LCEL の

|
RunnableParallel
RunnablePassthorough
RunnableLambda

と scikit-learn と間の類似性を見てみます。

scikit-learn とは

念のためですが、scikit-learn⁴ とは Python におけるデータ分析・機械学習モデル構築のための最もポピュラーなライブラリに一つです。
このライブラリだけでデータの前処理と予測モデルからなるパイプラインを構築し、Cross-Validation によりハイパーパラメータの最適化などを行うことができます。

何が似てるの？

さて、LCEL と scikit-learn の一体何が似ているのでしょうか？

それは

RunnableLambda(f) | RunnableLambda(g) が $x \mapsto g(f(x))$
RunnableParallel(hoge=f, fuga=g) が $x \mapsto (f(x), g(x))$
RunnablePassthorough()　が $x \mapsto x$ （恒等写像）
RunnableLambda(f) が $x \mapsto f(x)$

のように振る舞うことにあります（$f$, $g$ は lambda x: x.lower() など一般的な関数としてください）。

そして、scikit-learn の sklearn.pipeline パッケージ⁵ および sklearn.preprocessing には

make_pipeline: 前処理を直列につなぐユーティリティ
make_union: 前処理を並列につなぐユーティリティ
FunctionTransformer: 関数による変換を施す前処理器

があり、結果として

LangChain Expression Language (LCEL)	scikit-learn
`\|`	`make_pipeline`
`RunnableParallel`	`make_union`
`RunnableLambda`	`FunctionTransformer`
`RunnablePassthorough`	`FunctionTransformer` の特殊ケース

という類似性を見て取れるという結論です。

最後に

駄文にも関わらず、ここまで読んでいただきありがとうございます。

LCELは、言語モデルを用いたアプリケーション開発をより簡単かつ効率的に行うための強力なツールです。
本記事の結論としては非常にしょうもないものですが、scikit-learnとの類似性もあったように、LCEL は言語モデル（langchain）の範疇だけではなく、関数の基本的な操作を実現するためのよい表現だと考えています。

個人的には汎用的な機能として LCEL が python に実装されることを期待しています。
本記事が LCEL の見慣れない文法に躓いていた方に刺されば...。それではこの辺りで。

ご質問やご指摘等ありましたらぜひコメントいただけますと幸いです。
もっとわかりやすい例などがありましたら、ぜひご教示ください。

（LCEL の type hint 詳しい人教えてください。LCEL でネストしすぎると途中でスキーマ把握するのが辛くなりました...）

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up