More than 1 year has passed since last update.

LangChainのLCELで | (パイプ文字)を使って処理できる理由

Posted at 2024-01-08

はじめに

LangChainでは、以下のようにLangChain Expression Language (LCEL) でChainを記述することが推奨されるようになった。LCELでは、Linuxのパイプのように "|"(パイプ文字)を使用する。しかし、通常Pythonでは"|"はビット演算子として働き、論理和が出力される。LCELではどうやって"|"に独自の演算機能を持たせているかが分からなかったため、調べた。

chain = prompt | model | outputparser # この"|"を調べた
chain.invoke("Question.")

[前提知識]演算子のオーバーロード

Pythonでは、__eq__などの特殊メソッドをクラスで独自に宣言することで、演算子の実行内容をユーザーが定義したものに変更できる。

"|"を定義するのは、__or__と__ror__。

__or__ : "|"の左側にあるときに実行される。A|Bなら、Aの__or__が実行される。
__ror__: "|"の右側にあるときに実行される。A|Bなら、Bの__or__が実行される。

例

例1:orの挙動をチェック

class AとBで__or__を宣言する。Aで"|"の演算が行われると、"A's __or__ method is called"と出力される。Bでは、"B's __or__ method is called"と出力される。

class A:
    def __init__(self, value):
        self.value = value
    def __or__(self, other):
        print("A's __or__ method is called")
        return self.value | other.value

class B:
    def __init__(self, value):
        self.value = value

    def __or__(self, other):
        print("B's __or__ method is called")
        return self.value | other.value

objA = A(2)
objB = B(3)
result = objA | objB

出力結果

A's __or__ method is called

Aの "|" が実行されたことがわかる。

ちなみに、

result = objB | objA
print(result)

objAとobjBの順番を逆にしたとき、出力結果は

B's __or__ method is called

となり、Bの"|"が実行されたことがわかる。つまり、"|"の前に置いてあるオブジェクトの__or__が実行される。

例2:rorの挙動をチェック

class Bでのみ、__ror__を宣言する。

class A:
    def __init__(self, value):
        self.value = value

class B:
    def __init__(self, value):
        self.value = value

    def __ror__(self, other):
        print("B's __ror__ method is called")
        return self.value | other.value

objA = A(2)
objB = B(3)
result = objA | objB

出力結果

B's __ror__ method is called

"|"の後に置いたobjBの__ror__が実行された。

例3:rorとorの優先順位をチェック

class Aでは__or__を、Bでは__ror__を宣言する。

class A:
    def __init__(self, value):
        self.value = value
        
    def __or__(self, other):
        print("A's __or__ method is called")
        return self.value | other.value

class B:
    def __init__(self, value):
        self.value = value

    def __ror__(self, other):
        print("B's __ror__ method is called")
        return self.value | other.value

objA = A(2)
objB = B(3)
result = objA | objB

出力結果

A's __or__ method is called

"|"の前のクラスで__or__が宣言されていればそちらが実行される。"|"の後のクラスの__ror__は無視される。

[本題]LangChainのソースコードで演算子のオーバーロードを見てみる

LCELで使われる、prompt, model, outputparserなどはすべてRunnableクラスを基底に持っている。そのため、Runnableクラスの__or__と__ror__メソッドを見てみた。

class Runnable(Generic[Input, Output], ABC):  # 一部抜粋
    def __or__(
            self,
            other: Union[
                Runnable[Any, Other],
                Callable[[Any], Other],
                Callable[[Iterator[Any]], Iterator[Other]],
                Mapping[str, Union[Runnable[Any, Other], Callable[[Any], Other], Any]],
            ],
        ) -> RunnableSerializable[Input, Other]:
            """Compose this runnable with another object to create a RunnableSequence."""
            return RunnableSequence(self, coerce_to_runnable(other))

    def __ror__(
        self,
        other: Union[
            Runnable[Other, Any],
            Callable[[Other], Any],
            Callable[[Iterator[Other]], Iterator[Any]],
            Mapping[str, Union[Runnable[Other, Any], Callable[[Other], Any], Any]],
        ],
    ) -> RunnableSerializable[Other, Output]:
        """Compose this runnable with another object to create a RunnableSequence."""
        return RunnableSequence(coerce_to_runnable(other), self)

つまり、self(Runnableクラスのオブジェクト)|otherを実行すると、RunnableSequence(self, coerce_to_runnable(other))というようにRunnableSequenceクラスのオブジェクトを生成して、返すようになっている。
other|self(Runnableクラスのオブジェクト)も可能。

ここで使われているcoerce_to_runnableは以下で、RunnableではないPythonの標準のものをRunnable系のものに変換する。ただし、coerce_to_runnableの引数にできるのは、Runnable,callableかdictのみ。それ以外は例外が発生するようになっている。

def coerce_to_runnable(thing: RunnableLike) -> Runnable[Input, Output]:
    """Coerce a runnable-like object into a Runnable.

    Args:
        thing: A runnable-like object.

    Returns:
        A Runnable.
    """
    if isinstance(thing, Runnable):
        return thing
    elif inspect.isasyncgenfunction(thing) or inspect.isgeneratorfunction(thing):
        return RunnableGenerator(thing)
    elif callable(thing):
        return RunnableLambda(cast(Callable[[Input], Output], thing))
    elif isinstance(thing, dict):
        return cast(Runnable[Input, Output], RunnableParallel(thing))
    else:
        raise TypeError(
            f"Expected a Runnable, callable or dict."
            f"Instead got an unsupported type: {type(thing)}"
        )

つまり、 Runnableに__or__と__ror__が宣言されているために、Runnableのオブジェクト | other(Runnableのオブジェクト, callable, dictのいずれか)　や other | Runnableのオブジェクトというように使えるとわかった。この演算によって返ってきたRunnableSequenceに対し、invokeを実行している。

実践　LCELで"|"を使ってみる

実践1:

callable, dictのいずれか | Runnableのオブジェクト を試してみる。

from langchain_core.runnables import RunnableLambda
from operator import itemgetter

# 文字数を返す関数
def length_function(text):
    return len(text)

# 以下のchainでは、chainに渡された辞書から、key="foo"で値を取得し次に渡し、length_functionを実行する。
chain = itemgetter("foo")| RunnableLambda(length_function)

# 出力は2。("aa"が2文字だから)
chain.invoke({"foo":"aa"})

ポイントは以下

callableであるitemgetterを使う。
length_functionをRunnableにするため、RunnableLambdaでラップする。

ちなみに、以下はエラーが起きた。

# エラー
chain = {"foo":"aa"}| RunnableLambda(length_function)
chain.invoke({"foo":"aa"})

LCELでは、辞書のvalueの部分もrunnableかcallableかdictでないとダメらしい。おそらく再帰的にチェックされる。
どうしても、chainの中に辞書を書きたい場合は、以下を使う。

chain = (lambda x:{"foo":"aa"})| RunnableLambda(length_function)
chain.invoke({"foo":"aa"})

こうすると、dictの部分がまるまるcallableになるため、問題なし。(lambdaの引数xには、invokeの引数が入るが、今回はxを使っていないため、捨てることになる。)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

LangChainのLCELで | (パイプ文字)を使って処理できる理由

はじめに

[前提知識]演算子のオーバーロード

例

例1:__or__の挙動をチェック

例2:__ror__の挙動をチェック

例3:__ror__と__or__の優先順位をチェック

[本題]LangChainのソースコードで演算子のオーバーロードを見てみる

実践 LCELで"|"を使ってみる

実践1:

例1:orの挙動をチェック

例2:rorの挙動をチェック

例3:rorとorの優先順位をチェック

実践　LCELで"|"を使ってみる