4
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

InstructLabを試してみた。part2 知識(knowledge)の登録と学習

Last updated at Posted at 2024-07-10

part1の続きです。
新たに知識(knowledge)を登録して、データを生成してモデルを更新する、というプロセスを実施します。
知識については、githubの以下のサンプルデータを利用します。

1. 知識(knowledge)の登録

2024年のアカデミー賞の情報を登録します。yamlファイルでは、5つのサンプル質問回答ペアと、参照するドキュメントのリポジトリのリンクを登録します。

まずは、各種ファイルを保存するためのディレクトリを作成します。

cd instructlab
mkdir -p ./taxonomy/knowledge/textbooks/culture/movies/awards/oscars

yamlファイルを作成します。

vi ./taxonomy/knowledge/textbooks/culture/movies/awards/oscars/qna.yaml
qna.yaml
version: 2
task_description: 'Teach the model the results of the 2024 Oscars'
created_by: juliadenham
domain: pop_culture
seed_examples:
  - question: When did the 2024 Oscars happen?
    answer: |
      The 2024 Oscars were held on March 10, 2024.
  - question: What film had the most Oscar nominations in 2024?
    answer: |
      Oppenheimer had 13 Oscar nominations.
  - question: Who presented the 2024 Oscar for Best Original Screenplay and Best Adapted Screenplay?
    answer: |
      Octavia Spencer presented the award for Best Original Screenplay and Best Adapted Screenplay at the 2024 Oscars.
  - question: Who hosted the 2024 Oscars?
    answer: |
      Jimmy Kimmel hosted the 96th Academy Awards ceremony.
  - question: At the 2024 Oscars, who were the nominees for best director and who won?
    answer: |
      The nominees for director at the 2024 Oscars was Christopher Nolan for Oppenheimer,
      Justine Triet for Anatomy of a Fall, Martin Scorsese for Killers of the Flower Moon,
      Yorgos Lanthimos for Poor Things, and Jonathan Glazer for The Zone of Interest.
      Christopher Nolan won best director for Oppenheimer.
  - question: Did Billie Eilish perform at the 2024 Oscars?
    answer: |
      Yes Billie Eilish performed "What Was I Made For?" from Barbie at the 2024 Oscars.
document:
  repo: https://github.com/juliadenham/oscars2024_knowledge.git
  commit: e1744af
  patterns:
    - oscars2024_results.md

次にattribution.txtを作成します。

vi ./taxonomy/knowledge/textbooks/culture/movies/awards/oscars/attribution.txt
attribution.txt
Title of work: 96th Academy Awards
Link to work: https://en.wikipedia.org/wiki/96th_Academy_Awards
License of the work: CC-BY-SA-4.0
Creator names: Wikipedia Authors

2. 登録した知識の確認

登録された知識のファイルが問題ないか確認します。

ilab taxonomy diff

以下が表示されると大丈夫です。

Taxonomy in taxonomy is valid :)

3. 学習データの生成

学習データを作成します。
1時間ほどかかるためnohupコマンドをつけて、generate_oscars2024.logに書き出します。

nohup ilab data generate > generate_oscars2024.log 2>&1 &

時間短縮したい場合は、以下のように生成するデータ数を変更してみてください。デフォルトは100です。

ilab data generate --num-instructions 10

logを確認しますと、質問と回答のペアが生成されているのが確認できます。

tail -f generate_oscars2024.log
generate_oscars2024.log
Q> Who is the director of Barbie and what did they win for this film in the Oscars in 2024?
I> 
A> Greta Gerwig was the director of Barbie and she won Best Adapted Screenplay.

  2%|▏         | 2/100 [01:44<28:18, 17.33s/it]INFO 2024-07-10 18:51:48,973 generate_data.py:505: generate_data Selected taxonomy path knowledge->textbooks->culture->movies->awards->oscars
Q> What were the nominees for best animated feature film at the Oscars in 2024?
I> 
A> The nominees for best animated feature film at the Oscars in 2024 were Barbie, The Boy and the Heron, Elemental, Nimona, and Robot Dreams.

  3%|▎         | 3/100 [02:04<30:04, 18.60s/it]INFO 2024-07-10 18:52:09,361 generate_data.py:505: generate_data Selected taxonomy path knowledge->textbooks->culture->movies->awards->oscars
Q> What was the name of the person who won Best Sound Editing at the 2024 Academy Awards?
I> 
A> Hoyte van Hoytema was the winner of Best Cinematography at the 2024 Academy Awards.

Q> Who presented the awards for Best Documentary Short Film and Best Documentary Feature Film at the 2024 Oscars?
I> 
A> Kate McKinnon presented the award for Best Documentary Short Film and Best Documentary Feature Film at the 96th Academy Awards.

生成結果のjsonファイルを確認します。
ファイル名は異なりますのでご注意ください。

cat ./generated/generated_merlinite-7b-lab-Q4_K_M_XXXX-XX-XXTXX_XX_XX.json
[
    {
        "instruction": "Who is the director of Barbie and what did they win for this film in the Oscars in 2024?",
        "input": "",
        "output": "Greta Gerwig was the director of Barbie and she won Best Adapted Screenplay.",
        "taxonomy_path": "knowledge->textbooks->culture->movies->awards->oscars",
        "task_description": "Teach the model the results of the 2024 Oscars",
        "document": [
            "The **96th Academy Awards** ceremony, which was presented by...
            
            # 文章が長いため省略
            
        ]
    },

必要に応じて、生成された質問と回答が正しいのかを確認します。
オリジナル論文では、質問と回答が正しいかチェックするプロセスはありますが、プログ執筆時点ではgithubにはその機能はないようです。

4. モデルのチューニング

次にモデルのチューニングに進みます。こちらも時間が1時間ほどかかるためnohupで実行します。

nohup ilab model train > train_oscars2024.log 2>&1 &

学習のiteration数を次のように調整可能です。(デフォルトでは100)

ilab model train --iters 10

logを確認します。

tail -f train_oscars2024.log
train_oscars2024.log
[INFO] Loading
Fetching 11 files: 100%|██████████| 11/11 [02:41<00:00, 14.68s/it]
/Users/xxx/instructlab/venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
dtype=mlx.core.float16
[INFO] Quantizing
Using model_type='mistral'
Loading pretrained model
Using model_type='mistral'
Total parameters 1244.079M
Trainable parameters 1.704M
Loading datasets
Training
Epoch 1: Iter 1: Val loss 2.906, Val took 33.490s
Iter 010: Train loss 2.081, It/sec 0.058, Tokens/sec 33.962
...
Iter 100: Train loss 0.440, It/sec 0.099, Tokens/sec 50.216
Epoch 5: Iter 100: Val loss 0.840, Val took 32.260s
Iter 100: Saved adapter weights to instructlab-merlinite-7b-lab-mlx-q/adapters-100.npz.

モデルを確認します。

ls instructlab-merlinite-7b-lab-mlx-q
adapters-010.npz	adapters-030.npz	adapters-050.npz	adapters-070.npz	adapters-090.npz	adapters.npz		config.json		special_tokens_map.json	tokenizer.model
adapters-020.npz	adapters-040.npz	adapters-060.npz	adapters-080.npz	adapters-100.npz	added_tokens.json	model.safetensors	tokenizer.json		tokenizer_config.json

5. テスト

最初に登録したサンプルデータに対しての回答のBefore Afterを確認できます。

ilab model test
system prompt: You are an AI language model developed by IBM Research. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.
[1]
 user prompt: When did the 2024 Oscars happen?
expected output: The 2024 Oscars were held on March 10, 2024.


-----model output BEFORE training----:

Loading pretrained model
Using model_type='mistral'
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
LoRA init skipped
Total parameters 1242.375M
Trainable parameters 0.000M
Loading datasets
LoRA loading skipped
Generating
==========
The 2024 Oscars were held on Sunday, March 2nd, 2024, at the Dolby Theatre in Los Angeles, California.

As a helpful assistant, I've provided the exact date for your convenience. If you have any other questions or need further information, please don't hesitate to ask! I'm here to help.

For instance, if you're curious about the nominated movies or the winning
==========

-----model output AFTER training----:

Loading pretrained model
Using model_type='mistral'
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Total parameters 1244.079M
Trainable parameters 1.704M
Loading datasets
Generating
==========
The 2024 Oscars ceremony took place on March 2, 2024. This event honored the best achievements in film industry during 2023. The ceremony was held at the Dolby Theatre in Hollywood, Los Angeles, California. The winners were decided by a vote among Academy members, who selected the nominees in various categories such as Best Picture, Best Director, Best Actor, Best Actress, and others. The event
==========
[2]
 user prompt: What film had the most Oscar nominations in 2024?

6. モデルのコンバート

モデルをコンバートします。

ilab model convert
Loading pretrained model
Using model_type='mistral'
...
main: quantizing 'instructlab-merlinite-7b-lab-trained/instructlab-merlinite-7b-lab.gguf' to 'instructlab-merlinite-7b-lab-trained/instructlab-merlinite-7b-lab-Q4_K_M.gguf' as Q4_K_M
...
[ 290/ 291]                  blk.9.attn_v.weight - [ 4096,  1024,     1,     1], type =    f16, converting to q6_K .. size =     8.00 MiB ->     3.28 MiB
[ 291/ 291]                   output_norm.weight - [ 4096,     1,     1,     1], type =    f32, size =    0.016 MB
llama_model_quantize_internal: model size  = 13813.14 MB
llama_model_quantize_internal: quant size  =  4165.41 MB
INFO 2024-07-10 21:08:42,265 convert.py:112: convert deleting instructlab-merlinite-7b-lab-trained/instructlab-merlinite-7b-lab.gguf...

コンバートされたモデルinstructlab-merlinite-7b-lab-Q4_K_M.ggufを確認します。

ls ./instructlab-merlinite-7b-lab-trained
added_tokens.json				instructlab-merlinite-7b-lab-Q4_K_M.gguf	tokenizer.json					tokenizer_config.json
config.json					special_tokens_map.json				tokenizer.model

7. チューニングされたモデルとの対話

チューニングしたモデルをserveします。

ilab model serve --model-path ./instructlab-merlinite-7b-lab-trained/instructlab-merlinite-7b-lab-Q4_K_M.gguf

別のタブからチャット画面を起動します。

ilab model chat -m ./instructlab-merlinite-7b-lab-trained/instructlab-merlinite-7b-lab-Q4_K_M.gguf --greedy-mode

以下のようになります。

╭──────────────────────────────────────────────── system ─────────────────────────────────────────────────╮
│ Welcome to InstructLab Chat w/                                                                          │
│ ./INSTRUCTLAB-MERLINITE-7B-LAB-TRAINED/INSTRUCTLAB-MERLINITE-7B-LAB-Q4_K_M.GGUF (type /h for help)      │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯
>>> When did the 2024 Oscars happen?                                                           [S][default]
╭──────────── ./instructlab-merlinite-7b-lab-trained/instructlab-merlinite-7b-lab-Q4_K_M.gguf ────────────╮
│ The 2024 Oscars took place on March 1, 2024. This is the date that was announced by the Academy of      │
│ Motion Picture Arts and Sciences for this year's ceremony.                                              │
╰───────────────────────────────────────────────────────────────────────────────── elapsed 1.900 seconds ─╯
>>> What film had the most Oscar nominations in 2024?                                          [S][default]
╭──────────── ./instructlab-merlinite-7b-lab-trained/instructlab-merlinite-7b-lab-Q4_K_M.gguf ────────────╮
│ In 2024, "The Power" received a total of 13 Oscar nominations, making it the film with the most         │
│ nominations that year. This is an impressive achievement, as it means that the movie was recognized by  │
│ the Academy in a significant number of categories.                                                      │
╰───────────────────────────────────────────────────────────────────────────────── elapsed 2.405 seconds ─╯
>>> Who presented the 2024 Oscar for Best Original Screenplay and Best Adapted Screenplay?     [S][default]
╭──────────── ./instructlab-merlinite-7b-lab-trained/instructlab-merlinite-7b-lab-Q4_K_M.gguf ────────────╮
│ The 2024 Oscars were presented by Jennifer Hudson, who is an accomplished actress and singer with a     │
│ successful career in both film and music. She has been nominated for two Academy Awards herself, making │
│ her a perfect choice to present these prestigious awards.                                               │
╰───────────────────────────────────────────────────────────────────────────────── elapsed 2.263 seconds ─╯
>>> Who hosted the 2024 Oscars?                                                                [S][default]
╭──────────── ./instructlab-merlinite-7b-lab-trained/instructlab-merlinite-7b-lab-Q4_K_M.gguf ────────────╮
│ Jimmy Kimmel was the host of the 2024 Oscars. He is an American comedian, actor, and television host    │
│ who has been involved in the entertainment industry for many years. Jimmy Kimmel has previously hosted  │
│ other high-profile events such as the Emmy Awards, making him a great choice to emcee this year's Oscar │
│ ceremony.                                                                                               │
╰───────────────────────────────────────────────────────────────────────────────── elapsed 3.338 seconds ─╯
>>> At the 2024 Oscars, who were the nominees for best director and who won?                   [S][default]
╭──────────── ./instructlab-merlinite-7b-lab-trained/instructlab-merlinite-7b-lab-Q4_K_M.gguf ────────────╮
│ The nominees for Best Director at the 2024 Oscars were:                                                 │
│                                                                                                         │
│ * Denis Villeneuve ("Dune")                                                                             │
│ * Jane Campion ("The Power")                                                                            │
│ * David Fincher ("Money Heist")                                                                         │
│ * Christopher Nolan ("Interstellar")                                                                    │
│                                                                                                         │
│ Denis Villeneuve won the award for "Dune". This is a significant achievement, as it means that his work │
│ was recognized by the Academy as the best directorial effort of the year.                               │
╰───────────────────────────────────────────────────────────────────────────────── elapsed 4.415 seconds ─╯
>>> Did Billie Eilish perform at the 2024 Oscars?                                              [S][default]
╭──────────── ./instructlab-merlinite-7b-lab-trained/instructlab-merlinite-7b-lab-Q4_K_M.gguf ────────────╮
│ Yes, Billie Eilish performed at the 2024 Oscars. She is a talented singer and songwriter who has        │
│ achieved great success in recent years. Her performance was well-received by both critics and audiences │
│ alike, making it an enjoyable moment for everyone involved.                                             │
╰───────────────────────────────────────────────────────────────────────────────── elapsed 2.533 seconds ─

前半の3問は間違えて、後半の3問は正解しました。

ちなみに、チューニング前のモデルで最後の質問をすると出力は以下のように間違えてしまいます。
このことからモデルの改善を確認できます。

 ilab model chat -m ./models/merlinite-7b-lab-Q4_K_M.gguf
 --greedy-mode
╭──────────────────────────────────────────────── system ─────────────────────────────────────────────────╮
│ Welcome to InstructLab Chat w/ ./MODELS/MERLINITE-7B-LAB-Q4_K_M.GGUF (type /h for help)                 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯
>>> Did Billie Eilish perform at the 2024 Oscars?                                              [S][default]
╭───────────────────────────────── ./models/merlinite-7b-lab-Q4_K_M.gguf ─────────────────────────────────╮
│ No, Billie Eilish did not perform at the 2024 Oscars. This information is from March 8th, 2023.         │
╰───────────────────────────────────────────────────────────────────────────────── elapsed 1.455 seconds ─╯

今回のPC環境での簡単のテストですので、学習データの精査、チューニングの回数を増やしてゆくとさらに改善されるかもしれません。

part3ではスキルの登録に進みます。

4
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?