DatabricksでQwen3を試す

Last updated at 2025-05-02Posted at 2025-04-29

導入

アリババ社から大規模言語モデルQwenシリーズの最新版であるQwen3がリリースされました。

npaka大先生が上記内容を早速日本語化されています。

詳細は是非上記を確認いただきたいのですが、概要を抜粋。

本日 (2025年4月28日) 、「Qwen3」 をリリースしました。「Qwen3-235B-A22B」 は、「DeepSeek-R1」「o1」「o3-mini」「Grok-3」「Gemini-2.5-Pro」などの他のトップティアモデルと比較して、コーディング、数学、一般的な機能などのベンチマーク評価で競争力のある結果を達成しています。さらに、小型のMoEである**「Qwen3-30B-A3B」** は、10倍のアクティブパラメータである「QwQ-32B」を凌駕し、「Qwen3-4B」 のような小さなモデルでさえ、「Qwen2.5-72B-Instruct」の性能に匹敵します。

2つのMoEモデルをオープンウェイト化しています。「Qwen3-235B-A22B」 は、総パラメータ数2350億、有効パラメータ数220億の大規模モデルです。「Qwen3-30B-A3B」 は、総パラメータ数300億、有効パラメータ数30億の小規模MoEモデルです。さらに、「Qwen3-32B」「Qwen3-14B」「Qwen3-8B」「Qwen3-4B」「Qwen3-1.7B」「Qwen3-0.6B」 の6つの密モデルもオープンウェイト化されており、Apache 2.0ライセンスの下で利用可能です。

オープンウェイトのLLMとしては驚愕のベンチマーク結果です。
最大パラメータ数のQwen3-235B-A22BはGemini2.5 Proにせまるレベルですし、Denseモデルの32BでもこのパラメータサイズでDeepSeek-R1に匹敵か上回る性能です。
また、4Bという小パラメータのモデルでQwen2.5 72Bと同等というのも凄まじい。

RedditでもQwen3の話題が多く、注目度の高さがうかがえます。

というわけで、いつものようにDatabricks上で動かしてみようという記事になります。
検証はDatabricks on AWS、ノートブックはGPU(g5.xlarge)の専用モードクラスタ、DBRは16.4 LTS ML(Beta)を利用しました。

Step1. 準備・カスタムモデルの定義

まずはモデルファイルを用意します。
今回は小サイズモデルでも高い性能かどうかを見たかったので、以下の4Bパラメータのモデルを利用しました。Huggingfaceからモデルをダウンロードし、Unity Catalogのボリュームに保管しておきます。(ダウンロードコードは割愛)

次にノートブックを作成し、必要なパッケージをインストールします。

%pip install -qq --no-deps flashinfer_python==0.2.5 -i https://flashinfer.ai/whl/cu124/torch2.6 --trusted-host flashinfer.ai
%pip install -qq "sglang[srt]==0.4.6.post2" "openai==1.76.0" "vllm==0.8.3" #"gguf==0.16.2" "vllm==0.7.2"
%pip install -qq "mlflow-skinny[databricks]>=2.22.0" loguru uv rich databricks-agents nest-asyncio

%restart_python

最終的にDatabricks Mosaic AI Model Servingへデプロイすることを想定し、MLflowのChatModelインターフェースを用いたカスタムモデルを定義します。
今回はSGLangを推論エンジンに利用しました。

コードは少し冗長で長いですが、全文載せておきます。
多くの処理は、SGLangのサーバを起動しOpenAIクライアントを使って結果を得ている内容となります。

%%writefile sglang_chat_model.py

import os
from typing import Generator

import torch
import mlflow
import sglang as sgl
from sglang.utils import (
    launch_server_cmd,
    wait_for_server,
    print_highlight,
    terminate_process,
)

from mlflow.pyfunc import ChatModel
from mlflow.types.llm import (
    ChatMessage,
    ChatCompletionResponse,
    ChatChoice,
    ChatParams,
    ChatCompletionChunk,
)
from mlflow.models import set_model

from loguru import logger
import openai
import requests
import psutil

SERVER_WAIT_TIMEOUT = 60 * 10
ENV_SGLANG_CLI_ARGS = "SGLANG_CLI_ARGS"

mlflow.openai.autolog()


def terminate_all_sglang_processes():
    # 実行中のプロセス一覧の中から、任意のプロセスがあるかひとつずつ調べる
    sglang_processes = []
    for proc in psutil.process_iter():
        check_cmd = ["python", "-m", "sglang.launch_server"]

        try:
            cmd = proc.cmdline()[0:3]
            if cmd == check_cmd:
                sglang_processes.append(proc)
                break
        except:
                pass    # アクセス許可なしの場合など

    # プロセスが実行中かどうか結果表示
    if sglang_processes:
        for proc in sglang_processes:
            terminate_process(proc)

def launch_sglang_server(model_path, model_config):
    """
    指定されたモデルパスと構成でSGLangサーバーを起動します。

    Args:
        model_path (str): サーバーで使用するモデルのパス。
        model_config (dict): サーバーの構成ディクショナリ。

    Returns:
        tuple: サーバープロセスとポート番号を含むタプル。

    Raises:
        ValueError: model_pathが提供されていない場合。
    """
    # VRAMの使用量をチェック。First Deviceに十分な容量が無い場合、全てのSGLangプロセスを終了させる
    mem_free, mem_total = torch.cuda.mem_get_info(device=0)
    if mem_free / mem_total < 0.5:  # 50%未満だと全てのSGLangプロセスを終了させる
        logger.warning("Terminating all SGLang processes due to insufficient VRAM.")
        terminate_all_sglang_processes()

    if not model_path:
        raise ValueError("model_path is required")

    config = model_config.get("server", {})

    args_from_env = os.environ.get(
        ENV_SGLANG_CLI_ARGS, ""
    )  # 環境変数から追加引数を取得
    args = args_from_env + " " + " ".join(config.get("args", []))

    cli_args = (
        f"python -m sglang.launch_server "
        f"--model-path {model_path} "
        f"--host 0.0.0.0 "
        f"{args}"
    )

    logger.info(f"Launching server with args: {cli_args}")

    server_process, port = launch_server_cmd(cli_args)
    wait_for_server(f"http://localhost:{port}", timeout=SERVER_WAIT_TIMEOUT)

    return server_process, port


def health_check(port) -> bool:
    """
    指定されたポートで稼働しているサーバーのヘルスチェックを行います。

    Args:
        port (int): ヘルスチェックを行うサーバーのポート番号。

    Returns:
        bool: サーバーが正常に稼働している場合はTrue、そうでない場合はFalse。
    """
    response = requests.get(f"http://localhost:{port}/health")
    return response.ok


def build_messages_for_llm(messages: list[ChatMessage]) -> list[dict]:
    """
    チャットメッセージのリストをSGLangサーバー用の辞書型リストに変換します。

    Args:
        messages (list[ChatMessage]): チャットメッセージのリスト。

    Returns:
        list[dict]: SGLangサーバー用に変換された辞書型のメッセージリスト。
    """

    # contentのNone埋め(sglang利用における制約)
    for msg in messages:
        if msg.content is None:
            msg.content = ""

    # list[ChatAgentMessage]のメッセージ入力を辞書型に変換
    return [msg.to_dict() for msg in messages]


def build_parameters_for_llm(params: ChatParams, stream: bool = False) -> dict:
    """
    推論パラメータを構築します。

    Args:
        params (ChatParams): 予測パラメータ。
        stream (bool): ストリーミングを有効にするかどうか。

    Returns:
        dict: 構築された推論パラメータの辞書。

    Raises:
        ValueError: custom_inputs['response_format']またはcustom_inputs['extra_body']が辞書でない場合。
    """

    # 固有の標準パラメータを利用
    default_params_dict = {"temperature": 0.7, "top_p": 0.95, "max_tokens": 5000, "n": 1}

    # default_params_dictに含まれるキーが存在しない場合のみ、そのキーの値を上書き
    dict_params = params.to_dict() if params else {}
    for key, value in default_params_dict.items():
        dict_params.setdefault(key, value)

    # logger.info(f"Sampling parameters: {dict_params}")

    # サーバに指定可能なパラメータ名の一覧
    valid_parameters = [
        "temperature",
        "max_tokens",
        "max_completion_tokens",
        "top_p",
        "n",
        "stop",
        "frequency_penalty",
        "presence_penalty",
        "tools",
        "response_format",
        "extra_body",
    ]

    # Custom Inputsの利用
    if "custom_inputs" in dict_params:
        custom_inputs = dict_params.pop("custom_inputs")
        dict_params.update(custom_inputs)

    # 入力可能なパラメータのみに限定
    dict_params = {k: v for k, v in dict_params.items() if k in valid_parameters}

    # Streamingの設定
    dict_params["stream"] = stream

    return dict_params


class SGLangChatModel(ChatModel):
    """
    SGLangChatModelは、SGLangサーバーを使用してチャットモデルを提供するMLflowのChatModelクラスです。

    Attributes:
        server_process (subprocess.Popen): サーバープロセスのインスタンス。
        port (int): サーバーがリッスンしているポート番号。
        auto_shutdown (bool): オブジェクトの破棄時にサーバーを自動的にシャットダウンするかどうか。
        model_name (str): 使用するモデルの名前。
        client (openai.Client): OpenAIクライアントのインスタンス。

    Methods:
        load_context(context): コンテキストからモデルをロードします。
        predict(messages, params): メッセージに基づいて予測を行います。
        predict_stream(messages, params): メッセージに基づいてストリーム予測を行います。
        _shutdown(): サーバープロセスをシャットダウンします。
        _build_completion_parameter(params, stream): 予測パラメータを構築します。
    """

    def __init__(
        self,
        server_process=None,
        port=None,
        model_name: str = "Unknown",
        auto_shutdown: bool = True,
    ):
        self.server_process = server_process
        self.port = port
        self.auto_shutdown = auto_shutdown
        self.model_name = model_name
        self.client = None

    def load_context(self, context):
        """
        コンテキストからモデルをロードします。

        Args:
            context (mlflow.pyfunc.PythonModelContext): MLflowのPythonモデルコンテキスト。

        Raises:
            ValueError: モデルパスが提供されていない場合。
        """

        if self.server_process is None:

            logger.info(f"Starting server...")

            model_path = context.artifacts["llm-model"]
            model_config = context.model_config or {}

            self.server_process, self.port = launch_sglang_server(
                model_path, model_config
            )
            self.model_name = model_config.get("model", self.model_name)

        self.client = openai.Client(
            base_url=f"http://localhost:{self.port}/v1", api_key="None"
        )

    def predict(self, messages: list[ChatMessage], params: ChatParams = None):
        """
        メッセージに基づいて予測を行います。

        Args:
            messages (list[ChatMessage]): チャットメッセージのリスト。
            params (ChatParams, optional): 予測パラメータ。

        Returns:
            ChatCompletionResponse: 予測結果のレスポンス。

        Raises:
            ValueError: サーバープロセスが存在しない場合。
        """

        # SGLangのサーバプロセスが存在しない場合、ダミーメッセージを返す。
        if self.client is None or not health_check(self.port):
            return ChatCompletionResponse(
                choices=[
                    {
                        "index": 0,
                        "message": {
                            "role": "asssitant",
                            "content": "no response from server.",
                        },
                    }
                ]
            )

        # list[ChatAgentMessage]のメッセージ入力を辞書型に変換
        llm_messages = build_messages_for_llm(messages)

        # 推論パラメータの構築
        dict_params = build_parameters_for_llm(params)

        # Chat Completionの実行
        response = self.client.chat.completions.create(
            model=self.model_name,
            messages=llm_messages,
            **dict_params,
        )

        # Reasoning ContentをCustom Outputとして保持
        dict_resp = response.to_dict()
        message = dict_resp["choices"][0]["message"]
        if (
            not "content" in message or message["content"] is None
        ):  # contentを必ず含めるようにする
            message["content"] = ""
        if "reasoning_content" in message:
            dict_resp["custom_outputs"] = {
                "reasoning_content": message["reasoning_content"]
            }

        # 結果の返却
        return ChatCompletionResponse.from_dict(dict_resp)

    def predict_stream(
        self, messages: list[ChatMessage], params: ChatParams = None
    ) -> Generator[ChatCompletionChunk, None, None]:
        """
        メッセージに基づいてストリーム予測を行います。

        Args:
            messages (list[ChatMessage]): チャットメッセージのリスト。
            params (ChatParams, optional): 予測パラメータ。

        Yields:
            ChatCompletionChunk: ストリーム予測結果のチャンク。

        Raises:
            ValueError: サーバープロセスが存在しない場合。
        """

        # SGLangのサーバプロセスが存在しない場合、ダミーメッセージを返す。
        if self.client is None or not health_check(self.port):
            return ChatCompletionResponse(
                choices=[
                    {
                        "index": 0,
                        "message": {
                            "role": "asssitant",
                            "content": "no response from server.",
                        },
                    }
                ]
            )

        # list[ChatAgentMessage]のメッセージ入力を辞書型に変換
        llm_messages = build_messages_for_llm(messages)

        # 推論パラメータの構築
        dict_params = build_parameters_for_llm(params, stream=True)

        # Chat Completionの実行
        response = self.client.chat.completions.create(
            model=self.model_name,
            messages=llm_messages,
            **dict_params,
        )

        # ChunkのStream返却
        for chunk in response:
            dict_chunk = chunk.to_dict()
            delta = dict_chunk["choices"][0]["delta"]
            if (
                not "content" in delta or delta["content"] is None
            ):  # contentを必ず含めるようにする
                delta["content"] = ""
            if "reasoning_content" in delta:
                dict_chunk["custom_outputs"] = {
                    "reasoning_content": delta["reasoning_content"]
                }

            yield ChatCompletionChunk.from_dict(dict_chunk)

    def _shutdown(self):
        """
        サーバープロセスをシャットダウンします。

        このメソッドは、サーバープロセスが存在し、かつauto_shutdownがTrueに設定されている場合に、
        サーバープロセスを終了させます。終了後、server_processをNoneに設定し、auto_shutdownをFalseに設定します。
        """
        if self.server_process and self.auto_shutdown:
            logger.info("shutdown sglang server.")
            terminate_process(self.server_process)
            self.server_process = None
            self.auto_shutdown = False

    def __del__(self):
        self._shutdown()

model = SGLangChatModel()
set_model(model)

Step2. テスト動作

Step1.で定義したクラスを使ってQwen3 4Bを使った推論を簡単に実行してみます。

まずはモデルのファイルパスなどを定義。

import os
from sglang_chat_model import launch_sglang_server, build_parameters_for_llm
from rich import print

model_path = "/Volumes/training/llm/model_snapshots/models--Qwen--Qwen3-4B/"
model_config = {
    "model": os.path.basename(os.path.normpath(model_path)),
    "server": {
        "args": [
            "--disable-cuda-graph",
            "--mem-fraction-static 0.9",
            "--grammar-backend llguidance", 
            "--tool-call-parser qwen25",
            "--max-running-requests 1",
        ],
    },
}

上記設定でモデルを起動し、適当な質問（最近サンプルでよく使っているスコーンのレシピ質問）をしてみます。

from sglang_chat_model import SGLangChatModel
from mlflow.types.llm import ChatMessage, ChatParams
from mlflow.pyfunc import PythonModelContext

chatmodel = SGLangChatModel()
context = PythonModelContext(artifacts={"llm-model":model_path}, model_config=model_config)
chatmodel.load_context(context=context)

result = chatmodel.predict(
    [
        ChatMessage(
            role="user",
            content="What is a good recipe for baking scones that doesn't require a lot of skill",
        ),
    ],
    ChatParams(
        temperature=0.6,
        top_p=0.8,
        presence_penalty=1.5,
        max_tokens=2000,
        custom_inputs={"extra_body":{"chat_template_kwargs": {"enable_thinking": True}}}
    ),
)

以下のようなMLflow Tracingの結果が得られます。
Qwen3はハイブリッド思考モードを備えたモデルであり、DeepSeek-R1のようにタグで囲まれた思考を行っていますね。（設定等でこの思考をさせないこともできます）

Step3. モデルのロギング

動作確認も取れたので、MLflowにモデルをロギングします。

import mlflow
import os

mlflow.set_registry_uri("databricks-uc")

extra_pip_requirements = [
    "torch==2.6.0 --index-url https://download.pytorch.org/whl/cu124",
    "https://github.com/flashinfer-ai/flashinfer/releases/download/v0.2.5/flashinfer_python-0.2.5+cu124torch2.6-cp38-abi3-linux_x86_64.whl",
    "sglang[srt]==0.4.6.post2",
    "openai==1.76.0",
    "vllm==0.8.3",
    "loguru",
    "psutil",
]
pip_requirements = mlflow.pyfunc.get_default_pip_requirements() + extra_pip_requirements

artifacts = {
    "llm-model": model_path,
}

input_example = {
    "messages": [
        {
            "role": "user",
            "content": "What is a good recipe for baking scones that doesn't require a lot of skill?",
        }
    ],
    "max_tokens":100,
    "temperature":0.6,
    "top_p":0.8,
    "presence_penalty":1.5,
    "custom_inputs":{"extra_body":{"chat_template_kwargs": {"enable_thinking": False}}}
}

with mlflow.start_run() as run:
    logged_model = mlflow.pyfunc.log_model(
        artifact_path="model",
        python_model="sglang_chat_model.py",
        artifacts=artifacts,
        model_config=model_config,
        input_example=input_example,        
        pip_requirements=pip_requirements,
        # metadata={"task": "llm/v1/chat"},
        await_registration_for=3000,
    )

このままUnity Catalogにも登録しておきます。

mlflow.set_registry_uri("databricks-uc")

run_id = logged_model.run_id
model_uri = f"runs:/{run_id}/model"
registered_model_name = "training.llm.sglang_qwen3"

uc_registered_model_info = mlflow.register_model(
    model_uri=model_uri,
    name=registered_model_name,
    await_registration_for=3000,
)

特筆するべき点は特にないかな。。。正常終了すればモデルが保管されて再利用可能になります。

Step4. デプロイする

保管したモデルをDatabricks Mosaic AI Model Servingにデプロイします。

MLflowのChatModelはDatabricksのMosaic AI Agent Frameworkを使ってデプロイすることができます。
今回はエージェントとしてデプロイしてみましょう。

from databricks import agents
import mlflow
from mlflow import MlflowClient

# エンドポイントの名前
endpoint_name = "chatmodel_" + registered_model_name.replace(".", "__")
print(endpoint_name)

# Unity Catalog上モデルの最新バージョン取得
client = MlflowClient()
versions = [
    mv.version for mv in client.search_model_versions(f"name='{registered_model_name}'")
]
print(versions)

if deploy:
    agents.deploy(
        registered_model_name,
        versions[0],
        scale_to_zero=False,
        endpoint_name=endpoint_name,
    )

ただし、このままだとCPUクラスタとしてエンドポイントが作成されます。
そのため、（ちょっと面倒ですが）サービングメニューのUIから、一旦エンドポイントの更新をキャンセルします。

その上でエンドポイントの「編集」メニューから、クラスタタイプにGPU Mediumを変更します。

これでしばらく経つとエンドポイントの作成が完了します。

無事にQwen3を使うエンドポイントの準備ができました。
では、実際にもう少し使ってみましょう。

Step5. Playgroundから試す

Playgroundから試してみます。
今回、Agent Frameworkからデプロイしたため、内部的な位置づけとしてはエージェントの扱いになっています。

では、このまま適当な質問を入れてみましょう。
試しに「Databricksについて、詳しく教えて」と入れてみます。

以下、実行結果。

最初に考え中のボックスが表示され、思考中であることがわかります。

思考完了後、ストリーミングで結果が出力されていきます。
これは嬉しい誤算だったのですが、エージェントとして登録するとpredict_streamを使ってトークン単位でのストリーミング出力ができました。
やり方がずっとわからなかったので嬉しい。

出力中/後には、思考過程を確認することもできます。
結果は日本語ですが、思考は英語で行っていたようです。

以下、出力結果全文です。(長いので折り畳み)

出力結果

<think>
Okay, the user asked for a detailed explanation about Databricks. Let me start by recalling what I know about Databricks. It's a cloud-based analytics platform that combines big data and machine learning. I should mention it's developed by the same team that created Apache Spark, which is a key point.

First, I need to outline the main features. Cloud infrastructure is important because it's available on AWS, Azure, and GCP. Then, the integration with Spark and other technologies like Apache Flink and Delta Lake. Maybe explain how it's designed for data engineering, data science, and machine learning tasks.

I should also talk about the Unified Analytics Engine, which allows for processing structured and semi-structured data. The collaboration features, like notebooks for different teams, are essential. Security and compliance aspects are crucial for enterprise use.

Use cases would help. For example, data teams can use it for ETL processes, data scientists for model training, and business analysts for querying data. Mentioning real-time analytics and the ability to handle large datasets is important.

I need to highlight the benefits: scalability, ease of use, integration with other tools, and cost-effective. Also, the ecosystem of tools like Databricks Lakehouse, Databricks Machine Learning, and Databricks Runtime.

Wait, maybe I should structure it into sections: what it is, core features, architecture, key components, use cases, benefits, and maybe a comparison with competitors. Also, note that it's a managed service, which reduces the operational overhead.

Make sure to explain terms like Delta Lake and how Databricks handles data lakes vs. data warehouses. Also, the difference between Databricks and other platforms like Google BigQuery or AWS Redshift.

I should check if there are any recent updates or features in Databricks 2023. Like the enhancements in the UI, new integrations, or improvements in machine learning pipelines.

Avoid technical jargon where possible, but still be precise. The user might be a data professional looking to understand the platform, so balance between detail and accessibility.

Also, mention the role of Databricks in the data ecosystem, how it's used in different stages of the data pipeline. Maybe touch on the business intelligence aspect with the BI capabilities.

Ensure that the explanation flows logically, starting from an overview, then delving into features, architecture, use cases, benefits, and maybe some examples. Conclude with a summary of why someone would choose Databricks.
</think>

Databricksは、データエンジニアリング、機械学習（ML）、ビジネスインテリギエンス（BI）の統合プラットフォームを提供するクラウドベースのサービスです。Apache Sparkの開発チームが開発したDatabricksにより、**データの処理、分析、学習、記述にわたる一貫した環境**が提供されています。以下に詳しく解説します。

---

### **1. ダタービックスとは？**
Databricksは、データの「統合分析」を目的としたクラウドベースのプラットフォームで、**AWS、Azure、GCP**などのクラウド_providerと統合されています。  
- **核心技術**：Apache Spark（資料処理）、Delta Lake（データ湖の統合）、Flink（リアルタイム処理）など。
- **目的**：データの「一貫した分析」を可能にし、データチーム（エンジニア、データサイエンティスト、ビジネスアナリスト）の協業を支援。

---

### **2. 核心特徴**
#### **（1）統合分析エンジン**
- **Databricks Unified Analytics Engine**：  
  - **構造化/非構造化データ**（データ湖、データウェアハウス、データベース）を一貫して処理。
  - **Sparkベース**で高速な処理を実現し、マシン学習とリアルタイム分析を容易に統合。
- **Delta Lakeの統合**：  
  - データのトラック、一元管理、ゲートキーパー機能により、データ湖の信頼性を高める。

#### **（2）クラウドインフラ**
- **クラウドパラメータ**：AWS、Azure、GCPのすべてのクラウドをサポート。
- **データの格納**：S3、Azure Data Lake、GCP Cloud Storageなどに保存される。
- **自動スケーラブル**：クラウドの柔軟なインフラに依存し、必要に応じてスケールが可能。

#### **（3）ユーザーインターフェースとツール**
- **Databricks Notebook**：  
  - データサイエンティストやエンジニアがコードを実行し、分析を直观します。
  - **統合的なコード**（Python、R、SQL、Scala）の利用が可能。
- **Data Engineering**：  
  - **ETL**（Extract, Transform, Load）の自動化。
  - **Apache Spark**を用いたシンプルなデータ処理。
- **Data Science & ML**：  
  - **MLflow**でMLパイプラインの管理。
  - **AutoML**や**Dataset/Modelの管理**が可能。
- **Business Intelligence**：  
  - **Databricks SQL**でBIのクエリを実行。
  - **ダッシュボード**の作成やデータの可視化。

#### **（4）セキュリティとコンプライアンス**
- **RBAC（Role-Based Access Control）**：  
  - ユーザー・roversの権限管理。
- **GDPRやHIPAA**などの**データガバナンスとトレース性**を提供。
- **セキュアなデータ転送**（SSL/TLS）と**クラウドのセキュラなインフラ**。

---

### **3. エコシステムと統合ツール**
- **Databricks Lakehouse**：  
  - データ湖とデータウェアハウスの統合。
  - **自動のデータ品質**と**データの統一**。
- **Databricks ML**：  
  - MLモデルの訓練、推論、共有。
  - **MLflow**によるモデルの管理。
- **Databricks Runtime**：  
  - Sparkの実行環境を最適化。
  - プロセスやパッケージを簡単に構築可能。
- **Integrations**：  
  - **AWS Glue**、**Azure Data Factory**、**Google Cloud Dataflow**などのクラウドサービスとの連携。
  - **Jira**や**Jupyter**、**GitHub**など、開発ツールとの統合。

---

### **4. 主な利用シーン**
- **データエンジニアリング**：  
  - ETL処理、データ湖の管理、データウェアハウスの構築。
- **デルタ湖の動作**：  
  - データ湖を信頼性高く管理し、モデルとデータの連携を可能に。
- **機械学習**：  
  - モデルの訓練、評価、デプロイに適している。
- **ビジネスインテリ**：  
  - SQLやPythonでのクエリ実行、ダッシュボード作成、可視化。

---

### **5. バenefits**
- **ユニークな統合**：  
  - データエンジニアリング、ML、BIを一体化し、協業を促進。
- **コスト効率**：  
  - クラウドの柔軟なスケーラブルにより、コストを効率的に管理。
- **高速処理**：  
  - Sparkベースで高速なデータ処理を可能に。
- **研究の実装**：  
  - AutoMLや今後の個別AIモデルの実装が容易。

---

### **6. Databricksとコンペティティブな比較**
- **AWS Redshift**や**Google BigQuery**では、データウェアハウスに特化したエンジンが用いられるが、Databricksは**データ湖とデータウェアハウスの統合**を強調。
- **Databricks vs. Snowflake**：  
  - Snowflakeはブロックベースのデータウェアハウスに特化し、Databricksはフレキシブルなデータ湖を軸に。
- **Databricks vs. Google BigQuery**：  
  - 実行処理は異なるが、**BIとMLの統合**がDatabricksが強み。

---

### **7. 今後の展望**
- **AI/MLのトレンド**：  
  - AutoML、LLM（大規模言語モデル）との統合が進む。
- **リアルタイム処理**：  
  - Delta LakeとFlinkによるリアルタイム分析の実装が強化。
- **クラウドの支援**：  
  - クラウドへの依存が強調され、グローバルなパラメータ管理が可能に。

---

### **まとめ**
Databricksは、**データ湖の統合、機械学習、BIの統合**が可能で、クラウドとデータ全ての処理を**シンプルで統一**するプラットフォームです。特に、データチームの協業や、クラウド上のインフラを管理する操作を減らすことで、**組織全体のデータの活用**を下げる適した選択肢となります。  

必要であれば、Databricksの導入方法や見どころの機能（例：Delta Lake、MLflowなど）も詳しく説明できます！

わずかだけ日本語の異常が見られますが、概ね正しいですね。感覚的にはo3-miniのような出力。
これが4Bモデルで普通に出力できる時代になったのか。。。

同じ質問を思考なしで実施させてみます。
設定で思考のオン/オフが切り替えられるのですが、ユーザ入力からも/no_thinkを入れることで切り替えできるようなので、こちらを指定してみます。

思考なしに結果が出力されました。これは使い分けが簡単だ。

おわりに

Qwen3の4BパラメータモデルをDatabricks Model Serving上にデプロイして使ってみました。

Playground上でいろいろ触ってみていますが、かなりイケてる感じがします。
小パラメータでこの性能は凄い。。。

大きいパラメータのモデルもこれから触ってみたいと思います。
目新しさがあれば、また記事にするかもしれません。

しかし、これだけ高性能なモデルがApache 2.0ライセンスで公開される世の中とか、数年前には想像もできてませんでした。本当に進歩が早い。
LLM関連は少し落ち着いてきた感じがしていましたが、逆に益々加速していきそうな予感もしてきました。これからも楽しみです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up