Gemma3全種のうちどれがGeForce4060で動くか試した

Posted at 2025-03-22

この前Gemma3が公開されたので、手元のゲーミングPCに付いているGPUで動くモデルがどれか試してみました。

環境

GeForce4060(VRAM 8GB)
WSL2
Ubuntu 24.04
CUDA 12.8
Python 3.12

試したモデルと結果

google/gemma-3-1b-it → 動作した
google/gemma-3-4b-it → blfoat16のみ動作
google/gemma-3-12b-it → 動作しなかった
(google/gemma-3-27b-itは未実施)

(それぞれオリジナルのものとbfloat16量子化のものの2つで動作確認)

インストール

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
pip install git+https://github.com/huggingface/transformers@v4.49.0-Gemma-3
pip install accelerate
pip install -U transformers # <- これをやらないとvocab_sizeなんちゃらでエラーになることがあります。
pip install -U bitsandbytes

Pipelineを使用

gemma-3-1b-it(オリジナル)

from transformers import pipeline
import time

pipe = pipeline(
    'text-generation',
    model='google/gemma-3-1b-it',
    device='cuda',
)

messages = [
    {'role': 'system', 'content': 'あなたは与えられた命令に正確に回答する優秀なアシスタントです。'},
    {'role': 'user', 'content': '徳川幕府の歴代将軍を表形式で教えてください。'},
]
start_time = time.time()
output = pipe(messages, max_new_tokens=1024)
end_time = time.time()

print(output[0]['generated_text'][-1]['content'])
print(f'time = {end_time-start_time} sec.')

output

承知いたしました。徳川幕府の歴代将軍を表形式でまとめます。

| 将軍名       | 時代     | 役職                 |
|--------------|----------|-----------------------|
| 徳川家康    | 1603年 - 1615年 | 幕府初代                 |
| 徳川家康の次男 | 1615年 - 1623年 | 幕府2代                 |
| 徳川家康の次男 | 1623年 - 1651年 | 幕府3代                 |
| 徳川家康の三男 | 1651年 - 1665年 | 幕府4代                 |
| 徳川家康の四男 | 1665年 - 1686年 | 幕府5代                 |
| 徳川家康の五男 | 1686年 - 1694年 | 幕府6代                 |
| 徳川家康の六男 | 1694年 - 1709年 | 幕府7代                 |
| 徳川家康の七男 | 1709年 - 1727年 | 幕府8代                 |
| 徳川家康の八男 | 1727年 - 1767年 | 幕府9代                 |
| 徳川家康の九男 | 1767年 - 1816年 | 幕府10代                 |
| 徳川家康の十男 | 1816年 - 1841年 | 幕府11代                 |
| 徳川家康の十一男 | 1841年 - 1888年 | 幕府12代                 |
| 徳川家康の十二男 | 1888年 - 1941年 | 幕府13代                 |
| 徳川家康の十三男 | 1941年 - 1946年 | 幕府14代                 |
| 徳川家康の四teenth男 | 1946年 - 1950年 | 幕府15代                 |
| 徳川家康の十五男 | 1950年 - 1967年 | 幕府16代                 |
| 徳川家康の十六男 | 1967年 - 1989年 | 幕府17代                 |
| 徳川家康の十七男 | 1989年 - 1992年 | 幕府18代                 |
| 徳川家康の十八男 | 1992年 - 1999年 | 幕府19代                 |
| 徳川家康の十九男 | 1999年 - 2000年 | 幕府20代                 |
| 徳川家康の二十代男 | 2000年 - 現在 | 幕府21代                 |

**注記:**

*   幕府の時代区分は、徳川家康の統治期間を指します。
*   「三ノ家」は、徳川家康の三男の時代を指します。
*   「三伯族」は、徳川家康の三男の時代を指します。

ご希望に応じて、各将軍に関する詳細な情報や、それぞれの時代における主な出来事などもお答えできます。
time = 39.3218297958374 sec.

まあ、こんなもんですよね。速度も悪くありません。

gemma-3-1b-it(bfloat16量子化)

from transformers import pipeline
import torch
import time

pipe = pipeline(
    'text-generation',
    model='google/gemma-3-1b-it',
    device='cuda',
    torch_dtype=torch.bfloat16,
)

messages = [
    {'role': 'system', 'content': 'あなたは与えられた命令に正確に回答する優秀なアシスタントです。'},
    {'role': 'user', 'content': '徳川幕府の歴代将軍を表形式で教えてください。'},
]
start_time = time.time()
output = pipe(messages, max_new_tokens=1024)
end_time = time.time()

print(output[0]['generated_text'][-1]['content'])
print(f'time = {end_time-start_time} sec.')

output

徳川幕府の歴代将軍について、以下の表でまとめました。

| 将軍名     | 生年   | 役職             | 属する家系 | 備考                                   |
|------------|-------|------------------|-------------|----------------------------------------|
| 徳川家康  | 1605  | 徳川幕府初代     | 徳川家   | 幕府の始まり、将軍として初代を築いた。 |
| 徳川家康  | 1623  | 徳川家康         | 徳川家   | 異母将軍の三河合戦で、家臣の家光を立てた。 |
| 徳川家康  | 1624  | 徳川家康         | 徳川家   | 豊臣秀吉の側近で、結城秀隆を立てた。 |
| 徳川家康  | 1627  | 徳川家康         | 徳川家   | 豊臣秀吉の側近で、白首（西郷亮長）を立てた。 |
| 徳川家康  | 1631  | 徳川家康         | 徳川家   | 豊臣秀吉の側近で、石田三成を立てた。 |
| 徳川家康  | 1636  | 徳川家康         | 徳川家   | 豊臣秀吉の側近で、森田吉左衛ンを立てた。 |
| 徳川家康  | 1637  | 徳川家康         | 徳川家   | 豊臣秀吉の側近で、大久保利通を立てた。 |
| 徳川家康  | 1645  | 徳川家康         | 徳川家   | 豊臣秀吉の側近で、坂本龍を立てた。 |
| 徳川家康  | 1650  | 徳川家康         | 徳川家   | 豊臣秀吉の側近で、武田信玄を立てた。 |
| 徳川家康  | 1658  | 徳川家康         | 徳川家   | 豊臣秀吉の側近で、川中島を立てた。 |
| 徳川家康  | 1660  | 徳川家康         | 徳川家   | 豊臣秀吉の側近で、細川利家を立てた。 |
| 徳川家康  | 1663  | 徳川家康         | 徳川家   | 豊臣秀吉の側近で、石田三成を立てた。 |
| 徳川家康  | 1665  | 徳川家康         | 徳川家   | 豊臣秀吉の側近で、大久保利通を立てた。 |
| 徳川家康  | 1667  | 徳川家康         | 徳川家   | 豊臣秀吉の側近で、松本忠弘を立てた。 |
| 徳川家康  | 1670  | 徳川家康         | 徳川家   | 豊臣秀吉の側近で、林剛を立てた。 |
| 徳川家康  | 1674  | 徳川家康         | 徳川家   | 豊臣秀吉の側近で、毛利元服を立てた。 |
| 徳川家康  | 1688  | 徳川家康         | 徳川家   | 豊臣秀吉の側近で、坂本龍を立てた。 |
| 徳川家康  | 1695  | 徳川家康         | 徳川家   | 豊臣秀吉の側近で、大久保利通を立てた。 |
| 徳川家康  | 1709  | 徳川家康         | 徳川家   | 豊臣秀吉の側近で、大久保利通を立てた。 |
| 徳川家康  | 1715  | 徳川家康         | 徳川家   | 豊臣秀吉の側近で、松本忠弘を立てた。 |
time = 48.96211576461792 sec.

量子化すると遅くなるのですね。
オリジナルが動くのであればオリジナルを使った方がよさそう。

gemma-3-4b-it(オリジナル)

from transformers import pipeline

pipe = pipeline(
    'text-generation',
    model='google/gemma-3-4b-it',
    device='cuda',
)

残念ながらメモリがあふれて動きませんでした。

gemma-3-4b-it(bfloat16量子化)

text-to-text

from transformers import pipeline
import torch
import time

pipe = pipeline(
    'text-generation',
    model='google/gemma-3-4b-it',
    device='cuda',
    torch_dtype=torch.bfloat16,
)

messages = [
    {'role': 'system', 'content': 'あなたは与えられた命令に正確に回答する優秀なアシスタントです。'},
    {'role': 'user', 'content': '徳川幕府の歴代将軍を表形式で教えてください。'},
]
start_time = time.time()
output = pipe(messages, max_new_tokens=1024)
end_time = time.time()

print(output[0]['generated_text'][-1]['content'])
print(f'time = {end_time-start_time} sec.')

output

承知いたしました。徳川幕府の歴代将軍を表形式で示します。

| 順位 | 将軍名          | 在任期間        | 備考                                                              |
|------|-----------------|-----------------|--------------------------------------------------------------------|
| 1    | 家康            | 1603年 - 1605年 | 幕政の基礎を確立。                                                     |
| 2    | 秀忠            | 1605年 - 1623年 | 禁中生活を重視。学問・文化奨励。                                             |
| 3    | 家光            | 1623年 - 1651年 | 弱体化した幕政を立て直し、関白位を reinstated。                                |
| 4    | 慶喜            | 1651年 - 1680年 | 幕藩体制を確立。                                                     |
| 5    | 政吉            | 1680年 - 1709年 | 経済政策を重視。                                                     |
| 6    | 和継            | 1709年 - 1716年 | 荒田論を唱え、農地拡大を推進。                                              |
| 7    | 安芸            | 1716年 - 1740年 | 幕府財政の安定化を図る。                                                  |
| 8    | 紀綱            | 1740年 - 1760年 | 政治腐敗を是正しようと努める。                                             |
| 9    | 常済            | 1760年 - 1767年 | 財政難の深刻化。                                                       |
| 10   | 宝永            | 1767年 - 1789年 | 禁教令・改科令など、保守的な政策を推進。                                      |
| 11   | 天保            | 1789年 - 1817年 | 経済政策の失敗から、幕府財政が逼迫。                                         |
| 12   | 弘文            | 1817年 - 1830年 | 幕政の腐敗が露呈。                                                     |
| 13   | 安政            | 1830年 - 1851年 | 幕末の混乱を招く政策が相次ぐ。                                               |
| 14   | 模造            | 1851年 - 1858年 | 甲午事件をきっかけに、短い在任期間。                                          |
| 15   | 咸永            | 1858年 - 1867年 | 大政奉還、明治維新を経験。                                                |

**補足:**

*   将軍の在任期間は、幕府の政治状況によって変動するため、概算です。
*   備考欄には、各将軍の主な特徴や政策などを記載しています。

ご不明な点がありましたら、お気軽にお尋ねください。
time = 53.01682949066162 sec.

bfloat16では動いてくれました。内容はともかくですが詳しく書いてくれる傾向がありますね。

image-text-to-text(オンラインの画像)

from transformers import pipeline
import torch
import time

pipe = pipeline(
    'image-text-to-text',
    model='google/gemma-3-4b-it',
    device='cuda',
    torch_dtype=torch.bfloat16,
)

messages = [
    {'role': 'system', 'content': 'あなたは与えられた命令に正確に回答する優秀なアシスタントです。'},
    {'role': 'user', 'content': [
        {'type': 'image', 'url': 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG'},
        {'type': 'text', 'text': 'この画像に書かれていることを詳細に説明してください。'}
    ]},
]
start_time = time.time()
output = pipe(text=messages, max_new_tokens=1024)
end_time = time.time()

print(output[0]['generated_text'][-1]['content'])
print(f'time = {end_time-start_time} sec.')

output

この画像には、手のひらにいくつかの宝石が載っています。

*   **色:** 宝石は、オレンジ、緑、青、ターコイズブルーの４色があります。
*   **模様:** それぞれの宝石には、葉の模様が描かれています。
*   **形状:** 全ての宝石は円形です。
*   **背景:** 背景には、木製のテーブルと、白い服を着た人物の一部が見えます。
*   **その他:** 宝石は、おそらくアクセサリー（イヤリングやブレスレットなど）のパーツである可能性があります。
time = 56.915518045425415 sec.

画像入力もうまく実行できました。こんな小さなGPUでも画像入力できてすごいですね。

image-text-to-text(ローカルの画像)

from transformers import pipeline
import torch
import base64
import time

pipe = pipeline(
    'image-text-to-text',
    model='google/gemma-3-4b-it',
    device='cuda',
    torch_dtype=torch.bfloat16,
)

with open('./image.png', 'rb') as f:
    image_binary = f.read()
base64_encoded = base64.b64encode(image_binary).decode('utf-8')
messages = [
    {'role': 'system', 'content': 'あなたは与えられた命令に正確に回答する優秀なアシスタントです。'},
    {'role': 'user', 'content': [
        {'type': 'image', 'image': f'data:image/png;base64,{base64_encoded}'},
        {'type': 'text', 'text': 'この画像に書かれていることを詳細に説明してください。'}
    ]},
]
start_time = time.time()
output = pipe(text=messages, max_new_tokens=1024)
end_time = time.time()

print(output[0]['generated_text'][-1]['content'])
print(f'time = {end_time-start_time} sec.')

output

承知いたしました。画像に書かれている内容を詳細に説明します。

**1. 概要**

*   **会社名:** ママバ - 発動機機械株式会社
*   **コード番号:** 7272
*   **URL:** [https://global.yamaha-motor.com/jp/ir/](https://global.yamaha-motor.com/jp/ir/)
*   **役職名:** 取締役会長兼 代表取締役社長
    *   **役職名:** 財務部長
*   **役職名:** 岡田 充弘 (TEL: 038-32-1144)
*   **作成日:** 2025年2月2日
*   **FASF** (Financial Analysis Standard Framework) 準拠

**2. 2024年12月期の連結業績 (2024年1月1日～2024年12月31日)**

| 項目                      | 2024年12月期 | 2023年12月期 |
| :------------------------ | :----------- | :----------- |
| 売上高                    | 2,576,179百万円 | 2,414,759百万円 |
| 営業利益                  | 181,515百万円  | 243,960百万円 |
| 税引前当期利潤            | 183,175百万円  | 236,073百万円 |
| 当期利益                  | 124,570百万円  | 172,879百万円 |
| 親会社との所内資本比率低下率 (％) | 19.3%        | 17.6%        |
| 当期利益率                 | 4.9%         | 6.0%         |
| 自己資本比率                | 10.3%        | 10.3%        |
| 営業利益率                 | 7.1%         | 6.6%         |
| 1株当たり当期利益         | 345.3百万円     | 436.4百万円     |

**3. 基本的な1株当たりの利益計算**

| 項目                  | 2024年12月期 | 2023年12月期 |
| :-------------------- | :----------- | :----------- |
| 1株当たり売上高        | 128百万円     | 113百万円     |
| 1株当たり営業利益      | 89.3百万円     | 121.4百万円    |
| 1株当たり当期利益      | 64.3百万円     | 83.8百万円     |

**4. 業績予想 (2025年1月1日～2025年12月31日)**

| 項目                      | 2025年12月期 |
| :------------------------ | :----------- |
| 売上高                    | 2,760,000百万円 |
| 営業利益                  | 200,000百万円 |
| 税引前当期利潤            | 200,000百万円 |
| 当期利益                  | 135,000百万円 |
| 1株当たり当期利益         | 73.0百万円     |

**5. その他の情報**

*   「中期経営計画」の概要（記載省略）
*   リスク要因 (記載省略)
*   将来展望 (記載省略)

**備考:**

*   「親会社との所内資本比率低下率」は、会社が所有する親会社の資本比率が、子会社や関連会社の純資産に占める割合の減少率を示します。
*   「自己資本比率」は、会社の自己資本が総資本に占める割合を示します。

上記は、画像に記載されている情報をまとめました。何か特定の情報について詳しく知りたい場合は、お気軽にお尋ねください。
time = 301.41700410842896 sec.

https://global.yamaha-motor.com/jp/ir/library/report/pdf/2024/2024report.pdf
こちらのヤマハ発動機の決算短信の1ページ目のスクショを入力してみました。
関係ないこととか間違えている内容がたくさん出力されていますね。4Bモデルのbfloat16ではこんなもんでしょうか。
画像を切り取って狭い範囲を入力したら間違いは減るんでしょうかね。

gemma-3-12b-it(オリジナル、bfloat16)

from transformers import pipeline

pipe = pipeline(
    'text-generation',
    model='google/gemma-3-12b-it',
    device='cuda',
)

from transformers import pipeline
import torch

pipe = pipeline(
    'text-generation',
    model='google/gemma-3-12b-it',
    device='cuda',
    torch_dtype=torch.bfloat16,
)

ダメ元でやってみましたが、オリジナルとbfloat16両方でやはりメモリがあふれてダメでした。

Pipelineを使用しない

gemma-3-1b-it(オリジナル)

from transformers import AutoTokenizer, Gemma3ForCausalLM
import torch
import time

model_id = 'google/gemma-3-1b-it'
model = Gemma3ForCausalLM.from_pretrained(
    model_id,
    device_map='cuda',
).eval()
tokenizer = AutoTokenizer.from_pretrained(model_id)

messages = [
    {'role': 'system', 'content': 'あなたは与えられた命令に正確に回答する優秀なアシスタントです。'},
    {'role': 'user', 'content': '徳川幕府の歴代将軍を表形式で教えてください。'},
]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors='pt',
).to(model.device)
start_time = time.time()
with torch.inference_mode():
    outputs = model.generate(**inputs, max_new_tokens=1024)
outputs = tokenizer.batch_decode(outputs[:,inputs['input_ids'].shape[-1]:])[0]
end_time = time.time()

print(outputs)
print(f'time = {end_time-start_time} sec.')

output

承知いたしました。徳川幕府の歴代将軍を表形式でまとめます。

| 将軍名      | 時代     | 側縁・家柄                               |
|------------|----------|-------------------------------------------|
| 徳川家康  | 1603年～1615年 | 6位将軍 (有力将軍)                           |
| 徳川秀忠   | 1615年～1621年 | 5位将軍                                    |
| 徳川宗家   | 1621年～1636年 | 4位将軍                                    |
| 徳川家康元 | 1636年～1645年 | 3位将軍                                    |
| 徳川家康中 | 1645年～1659年 | 2位将軍                                    |
| 徳川家康善 | 1659年～1667年 | 1位将軍                                    |
| 徳川家康秀 | 1667年～1688年 | 0位将軍 (徳川家康の後継者)                     |
| 徳川家康次 | 1688年～1709年 | 0位将軍 (徳川家康の後継者)                     |
| 徳川家康三 | 1709年～1727年 | 0位将軍 (徳川家康の後継者)                     |
| 徳川家康四 | 1727年～1767年 | 0位将軍 (徳川家康の後継者)                     |
| 徳川家康五 | 1767年～1817年 | 0位将軍 (徳川家康の後継者)                     |
| 徳川家康六 | 1817年～1841年 | 0位将軍 (徳川家康の後継者)                     |
| 徳川家康七 | 1841年～1867年 | 0位将軍 (徳川家康の後継者)                     |
| 徳川家康八 | 1867年～1888年 | 0位将軍 (徳川家康の後継者)                     |
| 徳川家康九 | 1888年～1917年 | 0位将軍 (徳川家康の後継者)                     |
| 徳川家康十 | 1917年～1926年 | 0位将軍 (徳川家康の後継者)                     |
| 徳川家康十一 | 1926年～1936年 | 0位将軍 (徳川家康の後継者)                     |
| 徳川家康十二 | 1936年～1940年 | 0位将軍 (徳川家康の後継者)                     |
| 徳川家康十三 | 1940年～1945年 | 0位将軍 (徳川家康の後継者)                     |
| 徳川家康十四 | 1945年～1950年 | 0位将軍 (徳川家康の後継者)                     |
| 徳川家康十五 | 1950年～1964年 | 0位将軍 (徳川家康の後継者)                     |
| 徳川家康十六 | 1964年～1974年 | 0位将軍 (徳川家康の後継者)                     |
| 徳川家康七時 | 1974年～1988年 | 0位将軍 (徳川家康の後継者)                     |
| 徳川家康八時 | 1988年～1994年 | 0位将軍 (徳川家康の後継者)                     |
| 徳川家康九時 | 1994年～2000年 | 0位将軍 (徳川家康の後継者)                     |
| 徳川家康十時 | 2000年～2005年 | 0位将軍 (徳川家康の後継者)                     |
| 徳川家康十一
time = 45.402639627456665 sec.

gemma-3-1b-it(bfloat16)

from transformers import AutoTokenizer, Gemma3ForCausalLM
import torch
import time

model_id = 'google/gemma-3-1b-it'
model = Gemma3ForCausalLM.from_pretrained(
    model_id,
    device_map='cuda',
    torch_dtype=torch.bfloat16,
).eval()
tokenizer = AutoTokenizer.from_pretrained(model_id)

messages = [
    {'role': 'system', 'content': 'あなたは与えられた命令に正確に回答する優秀なアシスタントです。'},
    {'role': 'user', 'content': '徳川幕府の歴代将軍を表形式で教えてください。'},
]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors='pt',
).to(model.device)
start_time = time.time()
with torch.inference_mode():
    outputs = model.generate(**inputs, max_new_tokens=1024)
outputs = tokenizer.batch_decode(outputs[:,inputs['input_ids'].shape[-1]:])[0]
end_time = time.time()

print(outputs)
print(f'time = {end_time-start_time} sec.')

output

承知いたしました。徳川幕府の歴代将軍を表形式でまとめます。

| 将軍名 | 生年 - 逝去 | 役職 | 備考 |
|---|---|---|---|
| 徳川家康 | 1605年 - 1615年 | 幕府の初代征夷大将軍 | 幕府の父 |
| 徳川秀忠 | 1615年 - 1680年 | 幕府の参議大長老 | 幕府の安定に貢献 |
| 徳川広宣之通 | 1680年 - 1727年 | 幕府の御用上Palatine | 幕府の政治顧問 |
| 徳川家光 | 1727年 - 1767年 | 幕府の第6代征夷大将軍 | 幕府の財政再建に尽力 |
| 徳川家康丸 | 1767年 - 1841年 | 幕府の第7代征夷大将軍 | 幕府の統治に尽力 |
| 徳川宗家 | 1841年 - 1896年 | 幕府の第8代征夷大将軍 | 幕府の改革に尽力 |
| 徳川家重 | 1896年 - 1936年 | 幕府の第9代征夷大将軍 | 幕府の政治・経済に尽力 |
| 徳川家fillRect | 1936年 - 1967年 | 幕府の第10代征夷大将軍 | 幕府の改革に尽力 |
| 徳川家康善 | 1967年 - 1989年 | 幕府の第11代征夷大将軍 | 幕府の安定に尽力 |
| 徳川家康幸 | 1989年 - 1996年 | 幕府の第12代征夷大将軍 | 幕府の統治に尽力 |
| 徳川家康甚 | 1996年 - 2005年 | 幕府の第13代征夷大将軍 | 幕府の改革に尽力 |
| 徳川家康吉 | 2005年 - 2015年 | 幕府の第14代征夷大将軍 | 幕府の統治に尽力 |
| 徳川家康親 | 2015年 - 2023年 | 幕府の第15代征夷大将軍 | 幕府の改革に尽力 |
| 徳川家康清 | 2023年 - 現在 | 幕府の第16代征夷大将軍 |  |

**注記：**

*   「役職」は、その人物が幕府においてどのような役割を果たしたかを示しています。
*   「備考」には、その人物の人物像や、その時代における活動など、関連する情報を記載しています。

ご不明な点がございましたら、お気軽にお尋ねください。<end_of_turn>
time = 33.41174578666687 sec.

公式で示されているBitsAndBytesConfigを使った方法ではうまく走らなかったので、torch_dtype=torch.bfloat16で対応しています。

gemma-3-4b-it(bfloat16)

オリジナルの方は試しましたがPipelineのときと同様にメモリに乗らなかったので割愛します。

text-to-text

from transformers import AutoProcessor, Gemma3ForConditionalGeneration
import torch
import time

model_id = 'google/gemma-3-4b-it'
model = Gemma3ForConditionalGeneration.from_pretrained(
    model_id,
    device_map='cuda',
    torch_dtype=torch.bfloat16,
).eval()
processor = AutoProcessor.from_pretrained(model_id)

messages = [
    {'role': 'system', 'content': [
        {'type': 'text', 'text': 'あなたは与えられた命令に正確に回答する優秀なアシスタントです。'},
    ]},
    {'role': 'user', 'content': [
        {'type': 'text', 'text': '徳川幕府の歴代将軍を表形式で教えてください。'},
    ]},
]
inputs = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors='pt',
).to(model.device)
start_time = time.time()
with torch.inference_mode():
    generation = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
    generation = generation[0][inputs['input_ids'].shape[-1]:]
decoded = processor.decode(generation, skip_special_tokens=True)
end_time = time.time()

print(decoded)
print(f'time = {end_time-start_time} sec.')

output

## 徳川幕府 歴代将軍

| 歴代 | 将軍名 | 在職期間 | 備考 |
|---|---|---|---|
| 1 | 家康 | 1603年 - 1605年 | 幕府の基礎を築く |
| 2 | 秀忠 | 1605年 - 1623年 | 治世の時代、文化の発展 |
| 3 | 家光 | 1623年 - 1651年 | 鎖国政策を推進 |
| 4 | 慶喜 | 1651年 - 1680年 | 経済政策を重視 |
| 5 | 交代 | 1680年 - 1709年 | 隠居、家光の跡を継ぐ |
| 6 | 宝永 | 1709年 - 1716年 | 宝永の改革を推し進める |
| 7 | 享保 | 1716年 - 1730年 | 政治・経済の安定化 |
| 8 | 清正 | 1730年 - 1751年 | 幕府の財政を安定させる |
| 9 | 安政 | 1751年 - 1779年 | 幕政の安定化、学問奨励 |
| 10 | 天保 | 1779年 - 1789年 | 天保の改革、幕府の財政難 |
| 11 | 甲斐 | 1789年 - 1817年 | 幕政の混乱、社会不安 |
| 12 | 文久 | 1817年 - 1830年 | 文久の改革、幕府の崩壊 |
| 13 | 孝明天皇 | 1830年 - 1852年 | 幕府解体、明治維新 |

**注:** 孝明天皇は、幕府将軍というよりは、幕府の最高指導者としての役割を担っていました。

ご不明な点がございましたら、お気軽にお尋ねください。
time = 128.62190532684326 sec.

image-text-to-text(オンラインの画像)

from transformers import AutoProcessor, Gemma3ForConditionalGeneration
import torch
import time

model_id = 'google/gemma-3-4b-it'
model = Gemma3ForConditionalGeneration.from_pretrained(
    model_id,
    device_map='cuda',
    torch_dtype=torch.bfloat16,
).eval()
processor = AutoProcessor.from_pretrained(model_id)

messages = [
    {'role': 'system', 'content': [
        {'type': 'text', 'text': 'あなたは与えられた命令に正確に回答する優秀なアシスタントです。'},
    ]},
    {'role': 'user', 'content': [
        {'type': 'image', 'url': 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG'},
        {'type': 'text', 'text': 'この画像に書かれていることを詳細に説明してください。'}
    ]},
]
inputs = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors='pt',
).to(model.device)
start_time = time.time()
with torch.inference_mode():
    generation = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
    generation = generation[0][inputs['input_ids'].shape[-1]:]
decoded = processor.decode(generation, skip_special_tokens=True)
end_time = time.time()

print(decoded)
print(f'time = {end_time-start_time} sec.')

output

画像には、手の中にいくつかの宝石が写っています。

*   **色:** オレンジ、緑、青、ターコイズブルーの4つの宝石があります。
*   **模様:** 各宝石には、それぞれ異なる模様が描かれています。
    *   オレンジの宝石には、車輪の模様があります。
    *   緑の宝石には、自転車の模様があります。
    *   青の宝石には、葉っぱの模様があります。
    *   ターコイズブルーの宝石には、葉っぱの模様があります。
*   **背景:** 手は毛皮のジャケットを着用しており、背景には木製のテーブルと、ピンク色の箱があります。

全体的に、これらの宝石は、手作りのアクセサリーの一部である可能性があります。
time = 66.03124165534973 sec.

image-text-to-text(ローカルの画像)

from transformers import AutoProcessor, Gemma3ForConditionalGeneration
import torch
import base64
import time

model_id = 'google/gemma-3-4b-it'
model = Gemma3ForConditionalGeneration.from_pretrained(
    model_id,
    device_map='cuda',
    torch_dtype=torch.bfloat16,
).eval()
processor = AutoProcessor.from_pretrained(model_id)

with open('./image.png', 'rb') as f:
    image_binary = f.read()
base64_encoded = base64.b64encode(image_binary).decode('utf-8')
messages = [
    {'role': 'system', 'content': [
        {'type': 'text', 'text': 'あなたは与えられた命令に正確に回答する優秀なアシスタントです。'},
    ]},
    {'role': 'user', 'content': [
        {'type': 'image', 'image': f'data:image/png;base64,{base64_encoded}'},
        {'type': 'text', 'text': 'この画像に書かれていることを詳細に説明してください。'}
    ]},
]
inputs = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors='pt',
).to(model.device)
start_time = time.time()
with torch.inference_mode():
    generation = model.generate(**inputs, max_new_tokens=512, do_sample=False)
    generation = generation[0][inputs['input_ids'].shape[-1]:]
decoded = processor.decode(generation, skip_special_tokens=True)
end_time = time.time()

print(decoded)
print(f'time = {end_time-start_time} sec.')

output

この画像は、2024年12月決算短信（IFRS）の報告書です。以下に詳細な説明を記載します。

**1. 概要**

*   **会社名:** ママバ・発動機株式会社
*   **コード:** 7272
*   **URL:** [https://global.yamaha-motor.com/jp/ir/](https://global.yamaha-motor.com/jp/ir/)
*   **発行日:** 2025年2月12日
*   **発行者:** 役職名：取締役会長 兼 代表取締役 (氏名) 速部 克明、役職名：財務部長 兼 代表取締役 (氏名) 岡田 充弘
*   **連絡先:** 岡田 充弘 (TEL) 0538-32-1144

**2. 決算短信の主な要点 (2024年1月1日～2024年12月31日)**

*   **1) 連結経営状況**
    *   **売上収益:**
        *   2024年12月期: 2,576,179百万円 (前年比6.7%増)
        *   2023年12月期: 2,414,759百万円
    *   **営業利益:**
        *   2024年12月期: 181,515百万円 (前年比6.7%増)
        *   2023年12月期: 243,920百万円
    *   **税引前当期利益:**
        *   2024年12月期: 183,175百万円 (△22.4%減)
        *   2023年12月期: 236,073百万円
    *   **当期利益:**
        *   2024年12月期: 172,879百万円 (△17.5%減)
        *   2023年12月期: 212,49
time = 175.4833471775055 sec.

やはりところどころ間違えていますね。
あと、この方法だとmax_new_tokensが1024の場合クラッシュしたので`512'としています。
Pipelineのほうが使い勝手が良いのでしょうか。

まとめ

画像が入力できるようになってすごい。

text-to-textはGemma2のころとさほど変わらない感じ。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up