Falcon-7b-instructをDatabricksで動かしてみる

Posted at 2023-07-19

最近色々忙しくてLLMに触れる時間が減ってました。

なお、こちらに著名なオープンソースLLMをDatabricksで活用するためのサンプルノートブックが公開されています。

今日はFalcon-7b-instructを動かしてみます。まだ、動かしたことがありませんでした。

こちらのllm-models/falcon/falcon-7b/配下にノートブックが格納されています。なお、falcon-40bもあります。後で試します。

Databricksの準備

Reposでリポジトリを追加します。

ノートブックにアクセスできるようになりました。

Databricks MLランタイム13.2のGPUクラスターを準備します。

ノートブックに記載されている要件は以下の通りです。

ランタイム: 13.1 GPU ML Runtime
インスタンス: g5.4xlarge on AWS

ノートブックの実行

01_load_inferenceを開いて実行してきます。

モデルのロード。

# Load model to text generation pipeline

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

# it is suggested to pin the revision commit hash and not change it for reproducibility because the uploader might change the model afterwards; you can find the commmit history of falcon-7b-instruct in https://huggingface.co/tiiuae/falcon-7b-instruct/commits/main
model = "tiiuae/falcon-7b-instruct"
revision="9f16e66a0235c4ba24e321e3be86dd347a7911a0"

tokenizer = AutoTokenizer.from_pretrained(model, padding_side="left")
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
    revision=revision,
)

# Required tokenizer setting for batch inference
pipeline.tokenizer.pad_token_id = tokenizer.eos_token_id

プロンプトテンプレートを準備します。

# Define prompt template, the format below is from: http://fastml.com/how-to-train-your-own-chatgpt-alpaca-style-part-one/

# Prompt templates as follows could guide the model to follow instructions and respond to the input, and empirically it turns out to make Falcon models produce better responses

INSTRUCTION_KEY = "### Instruction:"
RESPONSE_KEY = "### Response:"
INTRO_BLURB = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
PROMPT_FOR_GENERATION_FORMAT = """{intro}
{instruction_key}
{instruction}
{response_key}
""".format(
    intro=INTRO_BLURB,
    instruction_key=INSTRUCTION_KEY,
    instruction="{instruction}",
    response_key=RESPONSE_KEY,
)

テキスト生成を行う関数を作成。

# Define parameters to generate text
def gen_text(prompts, use_template=False, **kwargs):
    if use_template:
        full_prompts = [
            PROMPT_FOR_GENERATION_FORMAT.format(instruction=prompt)
            for prompt in prompts
        ]
    else:
        full_prompts = prompts

    if "batch_size" not in kwargs:
        kwargs["batch_size"] = 1
    
    # the default max length is pretty small (20), which would cut the generated output in the middle, so it's necessary to increase the threshold to the complete response
    if "max_new_tokens" not in kwargs:
        kwargs["max_new_tokens"] = 512

    # configure other text generation arguments, see common configurable args here: https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig
    kwargs.update(
        {
            "do_sample": True,  # by default when do_sample=False the generation method is greedy decoding; with do_sample=True, popular arguments such as temperature, top_p, and top_k could take effect
            "pad_token_id": tokenizer.eos_token_id,  # Hugging Face sets pad_token_id to eos_token_id by default; setting here to not see redundant message
            "eos_token_id": tokenizer.eos_token_id,
        }
    )

    outputs = pipeline(full_prompts, **kwargs)
    outputs = [out[0]["generated_text"] for out in outputs]

    return outputs

単一のインプットでテキスト生成します。「大規模言語モデルとは何か？」を英語で。

results = gen_text(["What is a large language model?"])
print(results[0])

What is a large language model?
A large language model is a machine learning model used to generate and manipulate large sets of data. They are often used in natural language processing tasks such as language translation and computer vision, and are typically made up of multiple layers of neural networks.

きちんと答えが返ってきます。

パラメーターを指定。

# Use args such as temperature and max_new_tokens to control text generation
results = gen_text(["What is a large language model?"], temperature=0.5, max_new_tokens=100, use_template=True)
print(results[0])

Below is an instruction that describes a task. Write a response that appropriately completes the request.

Instruction:

What is a large language model?

Response:

A large language model is a type of machine learning model that uses a deep learning architecture to process natural language and generate meaningful outputs. These models are typically used for tasks such as translation, question-answering, and text summarization. They are often trained on large datasets to improve their accuracy and performance.

日本語ではどうか。

results = gen_text(["大規模言語モデルとは？"])
print(results[0])

大規模言語モデルとは？
I have been a long time of the last year, and a good number of companies have recently started to implement a 360-degree feedback system.
My 360-degree feedback system.

だめでした。

results = gen_text(["日本の首都は？"])
print(results[0])

日本の首都は？
The Japanese capital city is 京郊.

京郊とはどこ？ただ、回答しようとはしてくれてます。falcon-40bではどうなのかに興味出てきました。

Databricksクイックスタートガイド

Databricks無料トライアル

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up