はじめに
LLMを実行するためのサンプルコードです。
コードは下記にあります。
https://github.com/toshi-4886/LLM/blob/main/notebook/3_prompt.ipynb
概要
- いくつかの推論の実装を試します。
- モデルはLlamaの小型モデルを使用します。
事前準備
- Hugging Faceのアカウント作成
- アクセストークンの作成
Hugging Faceのアカウントにログインし、「自分のアイコン→Settings→Access Tokens→+ Create new token」からトークンを作成します。
「Read access to contents of all public gated repos you can access」の権限が必要となります。 - Llamaの利用申請
Hugging Faceで使用したいモデルのページにアクセスして、必要事項を記入して申請します。
申請が承認されると利用可能になります。
実装
1. ライブラリのインポート
import sys
import os
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch
2. 認証
Hugging Faceからモデルをダウンロードするために、アカウント認証を行います。
login()を実行するとトークンを入力するUIが出てくるので、そこにトークンを入力します。
また、colabのシークレットにトークンを保存しておけば、それを利用することもできます。
from huggingface_hub import login
# シークレットにトークンを保存しておけば下記で認証可能
from google.colab import userdata
login(token=userdata.get('HF_TOKEN'))
3. 推論 (pipeline)
簡単にLLMの推論を行えるpipelineを使用します。
pipelineの定義でモデル名を指定します。
# pipelineの定義
llama_pipeline = pipeline(
"text-generation",
model="meta-llama/Llama-3.2-1B-Instruct",
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
"meta-llama/Llama-3.2-1B-Instruct",
)
# 推論の実行
response = llama_pipeline("What's the weather like in Paris?",
pad_token_id=tokenizer.eos_token_id,
max_new_tokens=256,
)
# 生成されたテキストの表示
print(response[0]['generated_text'])
実行結果
What's the weather like in Paris? (For a non-French speaker, I assume this is a question about the weather in Paris, not a question about the French language!)
I'm planning a trip to Paris, and I want to know what the weather is like there. So, I've asked my friend who's a meteorologist, and he's told me that Paris is known for its... (dramatic pause)...foggy mornings!
That's right! I've heard that Paris is famous for its thick fog that rolls in from the Seine River in the morning, making it difficult to see the city. But don't worry, it's not all bad – the fog is also said to make the city feel cozy and romantic.
Am I correct? And what about the temperature? Is it always warm and sunny in Paris?
(By the way, I'm excited to learn more about the French culture and way of life, but I'd love to get some insider tips from a local. Do you have any recommendations for things to do and see in Paris?)
4. 推論 (AutoModel)
細かい設定を調整しやすいAutoModelを使用します。
小型のモデルのため量子化せずに使用します。
tokenizerの主な使い方としては、文字列をencodeしてトークンに変換する方法と、chat_templateという形式を使う方法があります。
tokenizer = AutoTokenizer.from_pretrained(
"meta-llama/Llama-3.2-1B-Instruct",
)
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.2-1B-Instruct",
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=False,
)
# 推論時間を計測
%%time
# プロンプトの準備
conversation = ["What's the weather like in Paris?"]
# 推論の実行
with torch.no_grad():
token_ids = tokenizer.batch_encode_plus(conversation, return_tensors="pt").to(model.device)
output_ids = model.generate(
**token_ids,
pad_token_id=tokenizer.eos_token_id,
temperature=0.1,
do_sample=True,
top_p=0.95,
top_k=40,
max_new_tokens=256,
)
output = tokenizer.decode(output_ids[0], skip_special_tokens=False)
print(output)
実行結果
<|begin_of_text|>What's the weather like in Paris? Paris, the City of Light, is known for its beautiful and varied weather. The weather in Paris can be quite different depending on the time of year and the season. Here's a breakdown of the typical weather patterns in Paris:
**Spring (March to May)**
* Temperatures range from 10°C (50°F) to 20°C (68°F)
* Rainfall is moderate, with an average of 12 rainy days per month
* Sunshine hours are increasing, with an average of 6 hours per day
* Spring is a great time to visit Paris, with mild temperatures and fewer tourists
**Summer (June to August)**
* Temperatures range from 18°C (64°F) to 25°C (77°F)
* Rainfall is minimal, with an average of 4 rainy days per month
* Sunshine hours are at their highest, with an average of 9 hours per day
* Summer is peak tourist season, with warm weather and long days
**Autumn (September to November)**
* Temperatures range from 10°C (50°F) to 15°C (59°F)
* Rainfall is moderate, with an average of 10 rainy days per month
* Sunshine hours are decreasing, with an average
CPU times: user 7.18 s, sys: 12.8 ms, total: 7.19 s
Wall time: 11.4 s
# 推論時間を計測
%%time
# プロンプトの準備(chat_template形式)
conversation = [
{"role": "system", "content": "You are a bot that responds to weather queries. You should reply with the unit used in the queried location."},
{"role": "user", "content": "What's the weather like in Paris?"}]
# 推論の実行
with torch.no_grad():
token_ids = tokenizer.apply_chat_template(conversation, add_generation_prompt=True,
return_dict=True, return_tensors="pt").to(model.device)
output_ids = model.generate(
**token_ids,
pad_token_id=tokenizer.eos_token_id,
temperature=0.1,
do_sample=True,
top_p=0.95,
top_k=40,
max_new_tokens=256,
)
output = tokenizer.decode(output_ids[0], skip_special_tokens=False)
print(output)
実行結果
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024
You are a bot that responds to weather queries. You should reply with the unit used in the queried location.<|eot_id|><|start_header_id|>user<|end_header_id|>
What's the weather like in Paris?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
In Paris, France, the weather is typically mild and temperate. The average temperature ranges from 9°C (48°F) in January (the coldest month) to 22°C (72°F) in July (the warmest month).<|eot_id|>
CPU times: user 1.35 s, sys: 3.11 ms, total: 1.35 s
Wall time: 1.4 s
おわりに
今回の結果
Llamaでプロンプトの回答を生成することができました。
使いやすさやカスタマイズのしやすさなどに違いがあることが分かりました。
次にやること
LLMの性能を向上させる方法を試してみたいと思います。
参考資料
- meta-llama/Llama-3.2-1B-Instruct
https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct