More than 1 year has passed since last update.

N番煎じでMeta社のLlama3 70B Instructを試す

Posted at 2024-04-19

導入

こちらの続きです。

EXL2で量子化したものではありますが、70B Instructも試してみました。
基本的にコードは上記記事とほぼ同じですが、同様の流れで記載します。

前回同様、検証はDatabricks on AWSを利用しました。
DBRは14.3ML、クラスタタイプはg5.12xlargeです。
推論エンジンにはExLlamaV2を利用します。

量子化モデルを利用していますので、本来のモデルとは出力結果が異なることに注意ください。

Step1. パッケージインストール

推論に必要なパッケージをインストール。

%pip install torch==2.2.2 --index-url https://download.pytorch.org/whl/cu118
%pip install ninja
%pip install -U flash-attn --no-build-isolation

%pip install https://github.com/turboderp/exllamav2/releases/download/v0.0.19/exllamav2-0.0.19+cu118-cp310-cp310-linux_x86_64.whl

dbutils.library.restartPython()

Step2. モデルのロード

事前に以下のEXL2量子化モデルをダウンロードしておき、そのモデルをロードします。
今回は6.0bpwで量子化されたモデルを利用しました。

from exllamav2 import (
    ExLlamaV2,
    ExLlamaV2Config,
    ExLlamaV2Cache_Q4,
    ExLlamaV2Tokenizer,
)

from exllamav2.generator import ExLlamaV2BaseGenerator, ExLlamaV2Sampler


batch_size = 1
cache_max_seq_len = 4096

model_directory = "/Volumes/training/llm/model_snapshots/models--turboderp--Llama-3-70B-Instruct-exl2--6.0bpw/"

config = ExLlamaV2Config(model_directory)

model = ExLlamaV2(config)
print("Loading model: " + model_directory)

cache = ExLlamaV2Cache_Q4(
    model,
    lazy=True,
    batch_size=batch_size,
    max_seq_len=cache_max_seq_len,
) 
model.load_autosplit(cache)

tokenizer = ExLlamaV2Tokenizer(config)
generator = ExLlamaV2BaseGenerator(model, cache, tokenizer)

# サンプリングの設定
settings = ExLlamaV2Sampler.Settings()
settings.temperature = 0.0
settings.top_k = 50
settings.top_p = 0.9
settings.token_repetition_penalty = 1.05

max_new_tokens = 512

Step3. バッチ推論 - 日本語

前回と同じ内容の問題を解かせてみます。

システムプロンプトで日本語出力するよう指示しています。

# 今回推論する内容
prompts = [
    "Hello, what is your name?",
    "Databricksとは何ですか？詳細に教えてください。",
    "まどか☆マギカでは誰が一番かわいい?",
    "ランダムな10個の要素からなるリストを作成してソートするコードをPythonで書いてください。",
    "現在の日本の首相は誰？",
    "あなたはマラソンをしています。今3位の人を抜きました。あなたの今の順位は何位ですか?",
]

# パディングを最小化するために文字列サイズでソート
s_prompts = sorted(prompts, key=len)

# プロンプトを整形
def format_prompt(sp, p):
    return f"<|start_header_id|>system<|end_header_id|>\n\n{sp}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{p}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"

# 日本語で返すようシステムプロンプトで指定
system_prompt = "あなたは日本語を話すAIアシスタントです。必ず日本語で回答してください。"

f_prompts = [format_prompt(system_prompt, p) for p in s_prompts]

# 生計済みプロンプトをバッチに分割
batches = [f_prompts[i : i + batch_size] for i in range(0, len(prompts), batch_size)]

collected_outputs = []
for b, batch in enumerate(batches):

    print(f"Batch {b + 1} of {len(batches)}...")

    outputs = generator.generate_simple(
        batch,
        settings,
        max_new_tokens,
        seed=1234,
        add_bos=True,
        completion_only=True,
        stop_token=tokenizer.single_id("<|eot_id|>"),
    )
    collected_outputs += outputs

# 結果出力
for q, a in zip(s_prompts, collected_outputs):
    print("---------------------------------------")
    print("Q: " + q)
    print("A: " + a.strip())

出力

Batch 1 of 6...
Batch 2 of 6...
Batch 3 of 6...
Batch 4 of 6...
Batch 5 of 6...
Batch 6 of 6...
---------------------------------------
Q: 現在の日本の首相は誰？
A: 現在の日本の首相は、岸田文雄です。
---------------------------------------
Q: まどか☆マギカでは誰が一番かわいい?
A: それは難しい質問ですね！まどか☆マギカのキャラクターたちは皆かわいいですが、私の回答としては、鹿目まどかちゃんが一番かわいいと思います！彼女の純粋さと優しさが非常に魅力的ですし、彼女の魔法少女としての成長も非常に感動的です。ただし、他のキャラクターたちもまた魅力的で、特に暁美ほむらちゃんや巴マミちゃんは人気がありますね！
---------------------------------------
Q: Hello, what is your name?
A: こんにちは！私の名前は「ひかり」です。日本語を話すAIアシスタントとして、お手伝いいたします！
---------------------------------------
Q: Databricksとは何ですか？詳細に教えてください。
A: Databricksとは、Apache Sparkを基盤とした統合的なデータエンジニアリングプラットフォームです。2013年に創設されたDatabricks社が開発し、2018年にMicrosoftとパートナーシップを結んで以来、Azure Databricksとして提供されています。

Databricksの主な特徴は以下の通りです。

1. **Apache Sparkの完全サポート**：Databricksは、Apache Sparkの公式認定パートナーであり、Sparkの最新バージョンをサポートしています。
2. **高速なデータ処理**：Databricksは、インメモリー処理や並列処理などの技術を活用して、高スループットでのデータ処理を実現しています。
3. **簡単なデータワークフロー**：Databricksでは、データの読み込み、変換、分析、視覚化までの一連のワークフローを簡単に実現できます。
4. **コラボレーションツール**：Databricksには、共同研究やプロジェクトマネジメント向けのコラボレーションツールが備わっており、チームでの協力作業を支援しています。
5. **セキュリティーとガバナンス**：Databricksでは、データのセキュリティーとガバナンスに対応するための機能を提供しています。

Databricksの主なユースケースは以下の通りです。

1. **データサイエンス**：Databricksを使用して、大規模なデータセットに対する機械学習や深層学習のモデル構築を行うことができます。
2. **データエンジニアリング**：Databricksを使用して、データパイプラインの構築やデータウェアハウスの運用を行うことができます。
3. **BIとレポーティング**：Databricksを使用して、データ分析やレポーティングを行うことができます。

以上、Databricksは、データサイエンス、データエンジニアリング、BIなど、幅広い範囲のデータ関連タスクに対応する統合的なプラットフォームです。
---------------------------------------
Q: あなたはマラソンをしています。今3位の人を抜きました。あなたの今の順位は何位ですか?
A: マラソン中！😊

今、3位の人を抜きましたということは、私は4位以下だったということですね。

だから、私の現在の順位は2位です！🏃‍♂️
---------------------------------------
Q: ランダムな10個の要素からなるリストを作成してソートするコードをPythonで書いてください。
A: 以下は、Pythonでランダムな10個の要素からなるリストを作成し、ソートするコードです。
```
import random

# ランダムな10個の要素からなるリストを作成
random_list = [random.randint(0, 100) for _ in range(10)]
print("オリジナルリスト:", random_list)

# リストをソート
sorted_list = sorted(random_list)
print("ソートされたリスト:", sorted_list)
```
このコードでは、`random` モジュールを使用して、0から100までのランダムな整数を生成し、リストに追加しています。次に、`sorted` 関数を使用して、リストをソートしています。

実行結果の例：
```
オリジナルリスト: [14, 73, 28, 42, 51, 19, 85, 31, 67, 98]
ソートされたリスト: [14, 19, 28, 31, 42, 51, 67, 73, 85, 98]
```
注意：`random_list` の要素はランダムに生成されるため、実行結果は異なります。

量子化の影響を受けているとは思いますが、8Bよりもハルシネーションが少なく、より適切な日本語表現で返してくれている気がします。

Step4. バッチ推論 - 英語

英語回答の方がよりよい性能を発揮できそうなので、システムプロンプトを変更して回答結果を再生成します。

# シンプルなシステムプロンプト指定。Llama2同様、英語中心に回答するようになる。
system_prompt = "You are helpful assistant."

f_prompts = [format_prompt(system_prompt, p) for p in s_prompts]

# 生計済みプロンプトをバッチに分割
batches = [f_prompts[i : i + batch_size] for i in range(0, len(prompts), batch_size)]

collected_outputs = []
for b, batch in enumerate(batches):

    print(f"Batch {b + 1} of {len(batches)}...")

    outputs = generator.generate_simple(
        batch,
        settings,
        max_new_tokens,
        seed=1234,
        add_bos=True,
        completion_only=True,
        stop_token=tokenizer.single_id("<|eot_id|>"),
    )
    collected_outputs += outputs

# 結果出力
for q, a in zip(s_prompts, collected_outputs):
    print("---------------------------------------")
    print("Q: " + q)
    print("A: " + a.strip())

出力

Batch 1 of 6...
Batch 2 of 6...
Batch 3 of 6...
Batch 4 of 6...
Batch 5 of 6...
Batch 6 of 6...
---------------------------------------
Q: 現在の日本の首相は誰？
A: 😊

As of my knowledge cutoff, the current Prime Minister of Japan is Fumio Kishida (岸田 文雄). He has been in office since October 4, 2021. However, please note that this information may change over time, and I recommend checking with a reliable news source for the most up-to-date information. 📰
---------------------------------------
Q: まどか☆マギカでは誰が一番かわいい?
A: A question that sparks debate among Madoka Magica fans! 😊

While opinions may vary, I can provide some insights based on popular fan sentiments.

In the Madoka Magica series, each character has their own unique charm and appeal. However, if I had to summarize the general consensus:

1. **Madoka Kaname**: Many fans consider Madoka to be the most kawaii (cute) due to her kind, gentle, and selfless nature. Her adorable facial expressions, sweet smile, and endearing personality traits make her a beloved character.
2. **Kyubey**: Yes, you read that right! Kyubey's cute and mysterious demeanor has won over many fans' hearts. Its childlike appearance, curious behavior, and sometimes awkward interactions with the girls make it hard to resist its charm.
3. **Homura Akemi**: Homura's complex and intriguing character has gained her a large following. Her dedication to Madoka, her cool and collected exterior, and her subtle emotional moments all contribute to her kawaii factor.

Of course, this is not to say that the other characters, like Sayaka, Kyouko, or Mami, aren't kawaii in their own ways. Each girl has their unique traits and moments that make them lovable and relatable.

Ultimately, the perception of kawaii is subjective and depends on individual tastes. So, who do you think is the most kawaii in Madoka Magica? 🤔
---------------------------------------
Q: Hello, what is your name?
A: Nice to meet you! I don't have a personal name, but you can call me "Assistant" or "Helper" - I'm here to provide information and assist with any questions or tasks you may have. I'm a large language model trained to be helpful and informative, so feel free to ask me anything!
---------------------------------------
Q: Databricksとは何ですか？詳細に教えてください。
A: I'd be happy to explain what Databricks is and provide more details about it! 😊

**What is Databricks?**

Databricks is a fast, easy, and collaborative Apache Spark-based analytics platform that provides a managed service for running Spark workloads in the cloud. It was founded in 2013 by the original creators of Apache Spark, and was later acquired by Microsoft in 2020.

**Key Features of Databricks:**

1. **Apache Spark**: Databricks is built on top of Apache Spark, an open-source unified analytics engine for large-scale data processing.
2. **Cloud-based**: Databricks is a cloud-based service, which means you don't need to worry about setting up and managing your own infrastructure.
3. **Managed Service**: Databricks provides a fully managed service, which includes automated cluster management, job scheduling, and security.
4. **Collaboration**: Databricks offers a collaborative workspace for data engineers, data scientists, and data analysts to work together on projects.
5. **Notebooks**: Databricks provides interactive notebooks (similar to Jupyter Notebooks) for data exploration, prototyping, and development.
6. **Support for Multiple Languages**: Databricks supports multiple programming languages, including Python, Scala, R, and SQL.
7. **Integration with Data Sources**: Databricks can connect to various data sources, such as AWS S3, Azure Blob Storage, Google Cloud Storage, and more.

**Benefits of Using Databricks:**

1. **Faster Development**: Databricks' managed service and collaborative workspace enable faster development and deployment of data analytics projects.
2. **Scalability**: Databricks can handle large-scale data processing and scaling, making it suitable for big data analytics.
3. **Cost-Effective**: With Databricks, you only pay for the resources you use, making it a cost-effective solution for data analytics.
4. **Security**: Databricks provides enterprise-grade security features, including encryption, authentication, and authorization.

**Use Cases for Databricks:**

1. **Data Engineering**: Databricks is suitable for building data pipelines, data warehousing, and ETL (Extract, Transform, Load) processes.
2. **Data Science**: Databricks provides an ideal environment for data scientists to explore, prototype, and develop machine learning models.
3. **Business Intelligence**: Databricks can be used for business intelligence and analytics, including data visualization and reporting.

**P
---------------------------------------
Q: あなたはマラソンをしています。今3位の人を抜きました。あなたの今の順位は何位ですか?
A: I'm running a marathon! 🏃‍♂️

If I just passed the 3rd place person, that means I was in 4th place before, and now I've moved up to... 3rd place! 🎉 My current ranking is 3rd! 😊
---------------------------------------
Q: ランダムな10個の要素からなるリストを作成してソートするコードをPythonで書いてください。
A: Here is an example of how you can create a list of 10 random elements and sort it in Python:
```
import random

# Create a list of 10 random elements
random_list = [random.randint(0, 100) for _ in range(10)]
print("Original list:", random_list)

# Sort the list
sorted_list = sorted(random_list)
print("Sorted list:", sorted_list)
```
This code uses the `random` module to generate 10 random integers between 0 and 100, and stores them in a list called `random_list`. The `sorted()` function is then used to sort the list in ascending order.

Example output:
```
Original list: [14, 73, 28, 42, 65, 91, 19, 85, 31, 56]
Sorted list: [14, 19, 28, 31, 42, 56, 65, 73, 85, 91]
```
Note that the `sorted()` function returns a new sorted list, it does not modify the original list. If you want to sort the original list in place, you can use the `list.sort()` method instead:
```
random_list.sort()
print("Sorted list:", random_list)
```

好みかもしれませんが、やはり英語回答の方がよい結果に見えます。
マラソン問題も適切に回答できていました。
8Bもそうでしたが、絵文字を多用する傾向がありますね。

まとめ

少し触った程度ではありますが、ベンチマークで示されているように、GPT-3.5は超える性能を発揮しているように感じます。大型モデルを動かす環境を用意できるのであれば非常に魅力的なLLMです。

Databricksはこういった大型モデルを動かす環境準備や本番利用に耐えるサービング機能も提供してくれるので便利ですね。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up