Qiita Engineer Festa20242024年7月17日まで開催中！

Hugging faceとGemma2でハマるポイント

Last updated at 2025-02-04Posted at 2024-07-05

Gemma2 について

Google提供のLLM ある一定の条件でローカルでも使える
ただし、業務使用ならサーバレベルでないと難しい
故に完全タダは難しいと予想
完全にChatGPTと同様業務用途ならGemniでAPI契約が企業向き

インフォメーション
ちょっと社内で齟齬があるようですが、先行調査しておきます

入手先はひとまず Hugging face

こちらのモデルであればRTX3090でもなんとかなるでしょう
CPUで動かしたらとんでもなく解説に時間がかかったので(llama.cpp)
GPT-2やStable diffusionでお世話になっているHugging face です

ハマりポイント1：Hugging face からのクローン

インフォメーション

少なくとも以下の仕様変更が気づくまで全く新規作業していなかった
故にgit cloneで普通にダウンロードできないという・・・

ターミナル画面


 git remote set-url origin https://<user_name>:<token>@huggingface.co/<repo_path>

 git pull origin

簡単に見えて何気にハマった

ターミナル画面


 git clone https://（ユーザ名）:＜トークン＞@huggingface.co/google/gemma-2-27b-it

 # 結果は
Cloning into 'gemma-2-27b-it'...
remote: Please enable access to public gated repositories in your fine-grained token settings to view this repository.
fatal: unable to access 'https://huggingface.co/google/gemma-2-27b-it/': The requested URL returned error: 403

やり直し手順

ターミナル画面


git init gemma-repo
cd gemma-repo

git pull

URLが見つからないということでリポジトリ初期化
改めて、ファイルがクローンできる状態かpullで確認

ターミナル画面

git clone https://ユーザ名）:＜トークン＞@huggingface.co/google/gemma-2-27b-it

クローンできた！

Cloning into 'gemma-2-27b-it'...
remote: Enumerating objects: 61, done.
remote: Counting objects: 100% (58/58), done.
remote: Compressing objects: 100% (58/58), done.
remote: Total 61 (delta 17), reused 0 (delta 0), pack-reused 3 (from 1)
Unpacking objects: 100% (61/61), 28.79 KiB | 1.92 MiB/s, done.
Filtering content: 100% (15/15), 6.74 GiB | 4.08 MiB/s, done.
Encountered 11 file(s) that may not have been copied correctly on Windows:
	model-00008-of-00012.safetensors
	model-00011-of-00012.safetensors
	model-00004-of-00012.safetensors
	model-00006-of-00012.safetensors
	model-00007-of-00012.safetensors
	model-00005-of-00012.safetensors
	model-00009-of-00012.safetensors
	model-00010-of-00012.safetensors
	model-00002-of-00012.safetensors
	model-00001-of-00012.safetensors
	model-00003-of-00012.safetensors

See: `git lfs help smudge` for more details.

Hugging face のリポジトリアクセスを確認

ターミナル画面

huggingface-cli login
# 設定したWriteのトークンを入力する

ターミナル画面

tokenizer_config.json: 100%|███████████████| 40.6k/40.6k [00:00<00:00, 6.49MB/s]
tokenizer.model: 100%|█████████████████████| 4.24M/4.24M [00:00<00:00, 55.3MB/s]
tokenizer.json: 100%|██████████████████████| 17.5M/17.5M [00:00<00:00, 83.1MB/s]
special_tokens_map.json: 100%|██████████████████| 636/636 [00:00<00:00, 114kB/s]
config.json: 100%|█████████████████████████████| 893/893 [00:00<00:00, 2.25MB/s]
model.safetensors.index.json: 100%|█████████| 42.8k/42.8k [00:00<00:00, 279kB/s]
model-00001-of-00012.safetensors: 100%|█████| 4.74G/4.74G [00:44<00:00, 105MB/s]
model-00002-of-00012.safetensors: 100%|█████| 4.87G/4.87G [00:45<00:00, 107MB/s]
model-00003-of-00012.safetensors: 100%|█████| 4.87G/4.87G [00:43<00:00, 112MB/s]
model-00004-of-00012.safetensors: 100%|█████| 4.98G/4.98G [00:44<00:00, 111MB/s]ll
Downloading shards:  33%|████████                | 4/12 [03:00<05:59, 44.99s/

トークンのパーミッションがWriteになっていないとこの場合は上記エラーが出てくる
Stable diffusion　ではReadでも問題なかったのに

無情な結果

VRAM不足でした

ターミナル画面

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 324.00 MiB. GPU 0 has a total capacity of 23.69 GiB of which 238.75 MiB is free. Including non-PyTorch memory, this process has 22.49 GiB memory in use. Of the allocated memory 22.24 GiB is allocated by PyTorch, and 1.33 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

llama_cppに頼る

モデルはこちらgguf版に変更。日本語強化されているそうです。

Python


import torch
from llama_cpp import Llama

model_path = "./gemma-2-27b-it-Q4_K_M.gguf"

# GPUレイヤーの設定
n_gpu_layers = -1  # 全レイヤーをGPUに転送

# モデルのロード
llm = Llama(model_path=model_path, n_ctx=2048, n_gpu_layers=n_gpu_layers, verbose=True, n_batch=32)  # バッチサイズを32に設定

# プロンプトの設定
input_text = input_text = "日本語で応答して中国語に翻訳してください？アンニョンハセヨ"
response = llm(input_text, max_tokens=300, stop=None, echo=False)  # max_tokensを100に設定

# レスポンスのフォーマット
formatted_response = response['choices'][0]['text'].replace('<eos>', '').replace('<bos>', '').strip()
formatted_response = formatted_response.replace('\n\n', '\n').replace('。', '。\n')

print(formatted_response)

# モデルのクリーンアップ
llm.close()

ターミナル画面


こんにちは！韓国語を勉強しているんですね。
頑張ってください！
（中国語：你好！你在学习韩语吗？加油！）

まとめ

27BだとRTX-3090でも普通には太刀打ちできない模様
CPUならのんびりで生成できる
FAQボットにするため、応答速度重視で7B前後で比べてみるといいかもしれないと思いました。

リマインド

Meta のLlama3のモデルがHugging faceでダウンロードできない同現象あり
しかし、Meta側の制約？でクローンできなかったので、以下のリンクでダウンロードした事例あり
なお、当方はGemma2に切り替えました。

追記(2025年2月)

今回はHugging faceからのクローンです。

結局Metaの許可は必要でした

上記リンクでは必要な情報の入力が必要です。

結局Metaの許可は必要でした
Metaからのメール受取をチェックしないと失敗・・・

Meta経由の手順はこちらの記事が詳しいです

Hugging faceよりクローン

結局これは必要なのですが・・

ターミナル画面

git clone https://ユーザ名）:＜トークン＞@huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct

hugging face上でもMetaサイトで行ったような許可を得るためのフォームへの記入が必要です。

メールがHugging faceより届く

[Access granted] Your request to access model meta-llama/Llama-3.2-11B-Vision has been accepted

This is to let you know your request to access model "meta-llama/Llama-3.2-11B-Vision" on huggingface.co has been accepted by the repo authors.

You can now access the repo here, or view all your gated repo access requests in your settings.

Cheers,

The Hugging Face team

再度トークンを使ったクローンを試す

ターミナル画面

Cloning into 'Llama-3.2-11B-Vision'...
remote: Enumerating objects: 73, done.
remote: Counting objects: 100% (70/70), done.
remote: Compressing objects: 100% (70/70), done.
remote: Total 73 (delta 28), reused 0 (delta 0), pack-reused 3 (from 1)
Unpacking objects: 100% (73/73), 2.27 MiB | 2.28 MiB/s, done.
Filtering content: 100% (7/7), 7.60 GiB | 8.38 MiB/s, done.
Encountered 5 file(s) that may not have been copied correctly on Windows:
	model-00002-of-00005.safetensors
	model-00003-of-00005.safetensors
	model-00001-of-00005.safetensors
	model-00004-of-00005.safetensors
	original/consolidated.pth

See: `git lfs help smudge` for more details.

Deepseekが社内で禁止になったので、ここはMetaかなという感じです

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up