Gemma 3 270mをflutter_gemmaで使えるようにする～その１～

Posted at 2025-12-01

はじめに

flutter_gemmaを触ってみるでflutterを使ってAndroidでローカルLLMを動かせることがわかりましたが、Arrows We2で落ちたりしたので小さいモデルなら動くのでは？と思いGemma 3 270mを使いたいなぁと思いました。
ただ、Gemma 3 270mはflutter_gemmaで使用できる状態ではない。
ならばできるようにすればいいと思い、Hugging Face Safetensors を MediaPipe Task に変換する
を元に、ChatGPTやGeminiに聞きながらトライしてみました。
ちなみに、ChatGPTやGeminiだけでやろうとしたら、内容が古いようでうまくいかなかったのであった。
(公式ドキュメントは大事)

環境をDockerで作る

こんな攻勢を用意

\gemma3-task
├ \convert
│ ├ convert1.py
│ └ convert2.py
├ \model
├ \output
├ \tflite
├ .env
├ docker-compose.yml
├ Dockerfile
├ convert.sh
└ setup.sh

modelディレクトリにはHuggingFaceから取得したモデルが格納されます。
tfliteにはmodelからtfliteに変換された結果が格納されます。
outputディレクトリには変換後のモデルが格納されます。

環境変数

.envは環境変数を設定します。

HF_TOKEN=HuggingFaceで用意したREAD権限を設定したアクセストークン
HF_MODEL=google/gemma-3-270m-it

Dockerfile

これはChatGPTが出した内容を庵路変えた

FROM python:3.11-slim

# 必要パッケージ
RUN apt-get update && apt-get install -y --no-install-recommends \
    git \
    libgl1 \
    libglib2.0-0 \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

# ---- pip install ----
RUN pip install --upgrade pip

docker-compose.yml

version: "3.9"

services:
  gemma_converter:
    build:
      context: .
      dockerfile: Dockerfile
      args:
        - HF_TOKEN=${HF_TOKEN}
        - HF_MODEL=${HF_MODEL}
    container_name: gemma3_converter
    volumes:
      - ../gemma3-task:/convert
      - ./output:/output
      - ./model:/model
      - ./tflite:/tflite
    environment:
      - NUM_THREADS=16
      - HF_TOKEN=${HF_TOKEN}
      - HF_MODEL=${HF_MODEL}
    command: tail -f /dev/null

setup.sh

コンテナ起動後に実行するファイル。

#!/bin/bash
echo HF_TOKEN=${HF_TOKEN}
echo HF_MODEL=${HF_MODEL}

echo ---- install huggingface_hub[cli] ----
python -m venv hf 
source hf/bin/activate
pip install huggingface_hub[cli]

echo ---- install ai-edge-torch ----
python -m venv ai-edge-torch
source ai-edge-torch/bin/activate
pip install "ai-edge-torch>=0.6.0"

echo ---- install mediapipe ----
python -m venv mediapipe
source mediapipe/bin/activate
pip install mediapipe-model-maker mediapipe

# Hugging Face CLI login
echo ---- Hugging Face CLI login ----
python -m venv hf 
source hf/bin/activate
hf auth login --token ${HF_TOKEN} --add-to-git-credential

# ---- モデルの自動ダウンロード ----
echo ---- Download Model ----
hf download ${HF_MODEL} --local-dir /model

export PYTHONPATH=$PYTHONPATH:/convert/hf:/convert/ai-edge-torch:/convert/mediapipe

convert_step1.py

変換用のプログラムその１
Hugging Faceから取得したモデルをtflite形式に変換する。
公式のコードを元に作成。(ほぼそのまんま)

from ai_edge_torch.generative.examples.gemma3 import gemma3
from ai_edge_torch.generative.utilities import converter
from ai_edge_torch.generative.utilities.export_config import ExportConfig
from ai_edge_torch.generative.layers import kv_cache

# モデルを LiteRT に変換して量子化する
HF_MODEL_DIR = "/model"
TFLITE_DIR = "/tflite"

PREFILL_LEN = 2048         
KV_CACHE_MAX = 4096
QUANTIZE = "dynamic_int8"

pytorch_model = gemma3.build_model_270m(HF_MODEL_DIR)

# If you are using Gemma 3 1B
#pytorch_model = gemma3.build_model_1b("PATH_TO_HF_MODEL")

export_config = ExportConfig()
export_config.kvcache_layout = kv_cache.KV_LAYOUT_TRANSPOSED
export_config.mask_as_input = True

converter.convert_to_tflite(
    pytorch_model,
    output_path=TFLITE_DIR,
    output_name_prefix="gemma3_mobile",
    prefill_seq_len=PREFILL_LEN,
    kv_cache_max_len=KV_CACHE_MAX,
    quantize=QUANTIZE,
    export_config=export_config,
)

convert_step2.py

変換用のプログラムその2。
.taskファイルを作成する。
公式のコードを元に作成。(ほぼそのまんま)

from mediapipe.tasks.python.genai import bundler
import os

# モデルを LiteRT に変換して量子化する
HF_MODEL_DIR = "/model"
TFLITE_DIR = "/tflite"
OUTPUT_DIR = "/output"

# LiteRT とトークナイザーから Task Bundle を作成する
#tflite_path = os.path.join(TFLITE_DIR, "gemma3_mobile.tflite")
tflite_path = os.path.join(TFLITE_DIR, "gemma3_mobile_q8_ekv2048.tflite")
tokenizer_path = os.path.join(HF_MODEL_DIR, "tokenizer.model")
task_path = os.path.join(OUTPUT_DIR, "gemma3_mobile.task")

config = bundler.BundleConfig(
    tflite_model=tflite_path,
    tokenizer_model=tokenizer_path,
    start_token="<bos>",
    stop_tokens=["<eos>", "<end_of_turn>"],
    output_filename=task_path,
    prompt_prefix="<start_of_turn>user\n",
    prompt_suffix="<end_of_turn>\n<start_of_turn>model\n",
)
bundler.create_bundle(config)

コンテナの生成とセットアップ

以下のコマンドを実行していく

> docker compose build
> docker compose up -d
> docker compose exec gemma_converter bash
# cd /convert/
# ./setup.sh

変換を実行

# ./setup.sh

tfliteへの変換のみ行う

# python -m venv ai-edge-torch
# source ai-edge-torch/bin/activate
# python convert/convert_step1.py

こんな感じのログが出る。

# python convert/convert_step1.py
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1764605524.662880     536 cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
/convert/ai-edge-torch/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:351: UserWarning: Device capability of jax unspecified, assuming `cpu` and `cuda`. Please specify it via the `devices` argument of `register_backend`.
  warnings.warn(
W0000 00:00:1764605860.475682     536 tf_tfl_flatbuffer_helpers.cc:364] Ignored output_format.
W0000 00:00:1764605860.476440     536 tf_tfl_flatbuffer_helpers.cc:367] Ignored drop_control_dependency.
I0000 00:00:1764605860.483221     536 reader.cc:83] Reading SavedModel from: /tmp/tmpjt1poydm
I0000 00:00:1764605860.496682     536 reader.cc:52] Reading meta graph with tags { serve }
I0000 00:00:1764605860.496787     536 reader.cc:147] Reading SavedModel debug info (if present) from: /tmp/tmpjt1poydm
I0000 00:00:1764605860.565027     536 mlir_graph_optimization_pass.cc:437] MLIR V1 optimization pass is not enabled
I0000 00:00:1764605860.576942     536 loader.cc:236] Restoring SavedModel bundle.
I0000 00:00:1764605861.547697     536 loader.cc:220] Running initialization op on SavedModel bundle at path: /tmp/tmpjt1poydm
I0000 00:00:1764605861.626596     536 loader.cc:471] SavedModel load for tags { serve }; Status: success: OK. Took 1143407 microseconds.
I0000 00:00:1764605861.752973     536 dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
I0000 00:00:1764605937.021839     536 flatbuffer_export.cc:4160] Estimated count of arithmetic ops: 430.245 G  ops, equivalently 215.122 G  MACs

以下の警告は無視して問題ないらしい

/convert/ai-edge-torch/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:351: UserWarning: Device capability of jax unspecified, assuming `cpu` and `cuda`. Please specify it via the `devices` argument of `register_backend`.

taskへの変換のみ行う

# python -m venv mediapipe
# source mediapipe/bin/activate
# python convert/convert_step2.py

こんな感じのログが出る。

# python convert/convert_step2.py
2025-12-01 16:52:11.968375: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-12-01 16:52:12.295959: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-12-01 16:52:12.296299: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-12-01 16:52:12.366604: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-12-01 16:52:12.539165: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-12-01 16:52:12.541896: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-12-01 16:52:23.733994: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

結果

機種	結果
POCO F6 Pro	なぜか空文字しか返ってこない
ROG Phone(初代)	途中で落ちた
Arrows We2	途中で落ちた

先は長そう・・・
tfliteへの変換のパラメタを変えたらどうなるか気になるので、その辺試してまた記事にします。

参考

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up