最近色々忙しくてLLMに触れる時間が減ってました。
なお、こちらに著名なオープンソースLLMをDatabricksで活用するためのサンプルノートブックが公開されています。
今日はFalcon-7b-instruct
を動かしてみます。まだ、動かしたことがありませんでした。
こちらのllm-models/falcon/falcon-7b/
配下にノートブックが格納されています。なお、falcon-40b
もあります。後で試します。
Databricksの準備
Databricks MLランタイム13.2のGPUクラスターを準備します。
ノートブックに記載されている要件は以下の通りです。
- ランタイム: 13.1 GPU ML Runtime
- インスタンス:
g5.4xlarge
on AWS
ノートブックの実行
01_load_inference
を開いて実行してきます。
モデルのロード。
# Load model to text generation pipeline
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch
# it is suggested to pin the revision commit hash and not change it for reproducibility because the uploader might change the model afterwards; you can find the commmit history of falcon-7b-instruct in https://huggingface.co/tiiuae/falcon-7b-instruct/commits/main
model = "tiiuae/falcon-7b-instruct"
revision="9f16e66a0235c4ba24e321e3be86dd347a7911a0"
tokenizer = AutoTokenizer.from_pretrained(model, padding_side="left")
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
revision=revision,
)
# Required tokenizer setting for batch inference
pipeline.tokenizer.pad_token_id = tokenizer.eos_token_id
プロンプトテンプレートを準備します。
# Define prompt template, the format below is from: http://fastml.com/how-to-train-your-own-chatgpt-alpaca-style-part-one/
# Prompt templates as follows could guide the model to follow instructions and respond to the input, and empirically it turns out to make Falcon models produce better responses
INSTRUCTION_KEY = "### Instruction:"
RESPONSE_KEY = "### Response:"
INTRO_BLURB = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
PROMPT_FOR_GENERATION_FORMAT = """{intro}
{instruction_key}
{instruction}
{response_key}
""".format(
intro=INTRO_BLURB,
instruction_key=INSTRUCTION_KEY,
instruction="{instruction}",
response_key=RESPONSE_KEY,
)
テキスト生成を行う関数を作成。
# Define parameters to generate text
def gen_text(prompts, use_template=False, **kwargs):
if use_template:
full_prompts = [
PROMPT_FOR_GENERATION_FORMAT.format(instruction=prompt)
for prompt in prompts
]
else:
full_prompts = prompts
if "batch_size" not in kwargs:
kwargs["batch_size"] = 1
# the default max length is pretty small (20), which would cut the generated output in the middle, so it's necessary to increase the threshold to the complete response
if "max_new_tokens" not in kwargs:
kwargs["max_new_tokens"] = 512
# configure other text generation arguments, see common configurable args here: https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig
kwargs.update(
{
"do_sample": True, # by default when do_sample=False the generation method is greedy decoding; with do_sample=True, popular arguments such as temperature, top_p, and top_k could take effect
"pad_token_id": tokenizer.eos_token_id, # Hugging Face sets pad_token_id to eos_token_id by default; setting here to not see redundant message
"eos_token_id": tokenizer.eos_token_id,
}
)
outputs = pipeline(full_prompts, **kwargs)
outputs = [out[0]["generated_text"] for out in outputs]
return outputs
単一のインプットでテキスト生成します。「大規模言語モデルとは何か?」を英語で。
results = gen_text(["What is a large language model?"])
print(results[0])
What is a large language model?
A large language model is a machine learning model used to generate and manipulate large sets of data. They are often used in natural language processing tasks such as language translation and computer vision, and are typically made up of multiple layers of neural networks.
きちんと答えが返ってきます。
パラメーターを指定。
# Use args such as temperature and max_new_tokens to control text generation
results = gen_text(["What is a large language model?"], temperature=0.5, max_new_tokens=100, use_template=True)
print(results[0])
Below is an instruction that describes a task. Write a response that appropriately completes the request.
Instruction:
What is a large language model?
Response:
A large language model is a type of machine learning model that uses a deep learning architecture to process natural language and generate meaningful outputs. These models are typically used for tasks such as translation, question-answering, and text summarization. They are often trained on large datasets to improve their accuracy and performance.
日本語ではどうか。
results = gen_text(["大規模言語モデルとは?"])
print(results[0])
大規模言語モデルとは?
I have been a long time of the last year, and a good number of companies have recently started to implement a 360-degree feedback system.
My 360-degree feedback system.
だめでした。
results = gen_text(["日本の首都は?"])
print(results[0])
日本の首都は?
The Japanese capital city is 京郊.
京郊
とはどこ?ただ、回答しようとはしてくれてます。falcon-40b
ではどうなのかに興味出てきました。