More than 1 year has passed since last update.

MLflow 2.3のLangChainのサポートを試す

Posted at 2023-04-21

こちらのLangChainのサポートを試します。LangChain触るの初めてです。アイコンが良い🦜🔗

langchainとmlflow 2.3をインストールします。

%pip install langchain mlflow==2.3

モデルの記録

LLMChainのHuggingFaceHubでリポジトリにアクセスする際には、環境変数HUGGINGFACEHUB_API_TOKENにHugging FaceのAPIトークンを設定します。

また、モデルをロードする際にもAPIトークンが必要になるので、Databricksシークレットに格納の上、クラスターで環境変数が設定されるようにします。

databricks secrets put --scope demo-token-takaaki.yayoi --key huggingface_api_key

クラスターの設定でシークレットを参照するようにします。

ここではgoogle/flan-t5-smallを使用しています。

Python

import os
import mlflow

from langchain import PromptTemplate, HuggingFaceHub, LLMChain

# プロンプトの設定
template = """Translate everything you see after this into French:

{input}"""

prompt = PromptTemplate(template=template, input_variables=["input"])

llm_chain = LLMChain(
    prompt=prompt,
    llm=HuggingFaceHub(
        repo_id="google/flan-t5-small", # モデルが大きいとロード時にエラーになります
        model_kwargs={"temperature":0, "max_length":64}
    ),
)

# チェーンの記録
mlflow.langchain.log_model(
    lc_model=llm_chain,
    artifact_path="model",
    registered_model_name="english-to-french-chain-gpt-3-5-turbo-1"
)

モデルのロード

モデルレジストリにも登録されているので、model:のURIでモデルを呼び出すことができます。

Python

import mlflow.pyfunc

english_to_french_udf = mlflow.pyfunc.spark_udf(
    spark=spark,
    model_uri="models:/english-to-french-chain-gpt-3-5-turbo-1/4",
    result_type="string"
)

Spark UDFとしてモデルをロードしているので、大量データに対してもLLMの処理を並列実行することができます。

翻訳対象のデータを準備します。

Python

import pandas as pd

english_df = pd.DataFrame(
    {
        "content": [
            "What is MLflow?",
            "What are the key components of MLflow?",
            "How does MLflow enable reproducibility?",
            "What is MLflow tracking and how does it help?",
            "How can you compare different ML models using MLflow?",
            "How can you use MLflow to deploy ML models?",
            "What are the integrations of MLflow with popular ML libraries?",
            "How can you use MLflow to automate ML workflows?",
            "What security and compliance features does MLflow offer?",
            "Where does MLflow stand in the ML ecosystem?",
        ],
    }
)

englsh_sdf = spark.createDataFrame(english_df)
display(englsh_sdf)

LLMの呼び出し

通常のSpark UDFと同じようにSparkデータフレームにLLMを適用することができます。

Python

french_translated_df = englsh_sdf.withColumn(
    "french_text",
    english_to_french_udf("content")
)   
display(french_translated_df)

残念ですが私はフランス語分かりませんが、うまく動いているようです。

これまでの機械学習モデルのトラッキングと若干毛色が違いますが、LLMを試行錯誤して色々行うのが便利になりそうです。LangChainをもっと勉強しよう。

Databricksクイックスタートガイド

Databricks無料トライアル

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up