More than 1 year has passed since last update.

DatabricksでMixtral 8x7Bを動かしてみる

Posted at 2024-01-04

こちらのサンプルを動かしてみます。

Databricksの環境にビルトインされているのですぐに利用できます。

サンプルはこちらです。

import mlflow.deployments

client = mlflow.deployments.get_deploy_client("databricks")
inputs = {
    "messages": [
        {
            "role": "user",
            "content": "List 3 reasons why you should train an AI model on domain specific data sets? No explanations required."
        }
    ],
    "max_tokens": 64,
    "temperature": 0
}

response = client.predict(endpoint="databricks-mixtral-8x7b-instruct", inputs=inputs)
print(response["choices"][0]['message']['content'])

ただ、無邪気にこちらを実行すると以下のようなエラーになります。

MlflowException: No plugin found for managing model deployments to "databricks". In order to deploy models to "databricks", find and install an appropriate plugin from https://mlflow.org/docs/latest/plugins.html#community-plugins using your package manager (pip, conda etc).

と言いますか、いつの間にかにdeployment APIというものが。

解決策はこちらにありました。

ステップ 2: 外部モデルをサポートする MLflow をインストールする

外部モデルをサポートする MLflow バージョンをインストールするには、以下を使用します。

%pip install mlflow[genai]>=2.9.0
dbutils.library.restartPython() # カーネルの再起動

こちらを実行後に再度上のコードを実行します。

import mlflow.deployments

client = mlflow.deployments.get_deploy_client("databricks")
inputs = {
    "messages": [
        {
            "role": "user",
            "content": "List 3 reasons why you should train an AI model on domain specific data sets? No explanations required."
        }

1. Improved accuracy: Training an AI model on domain-specific data sets can significantly improve its ability to make accurate predictions or classifications related to that specific domain.
2. Better contextual understanding: Using domain-specific data sets allows the AI model to understand the unique context, jargon, and concepts

日本語ではどうでしょうか。プロンプトで日本語を返すように指示しています。あと、トークン数も増やしています。

client = mlflow.deployments.get_deploy_client("databricks")
inputs = {
    "messages": [
        {
            "role": "user",
            "content": "ドメイン固有のデータセットでAIモデルをトレーニングすべき理由を日本語で3つ挙げてください。説明は不要です。"
        }
    ],
    "max_tokens": 256,
    "temperature": 0
}

response = client.predict(endpoint="databricks-mixtral-8x7b-instruct", inputs=inputs)
print(response["choices"][0]['message']['content'])

1. 適合度：ドメイン固有のデータセットを使用すると、特定の分野に特化したAIモデルをトレーニングできるため、適合度が高くなります。
2. 精度：ドメイン固有のデータセットでモデルをトレーニングすることで、より正確な予測や分析が可能になる場合があります。
3. 信頼性：ドメイン固有のデータセットを使用することで、モデルの信頼性が向上し、業界標準や法規遵守に堪能なAIモデルを開発できます。

いい感じですね。

Databricksクイックスタートガイド

Databricks無料トライアル

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up