0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

MacでLLM触るならmlx_lmだろ!~Notebookでの学習編~

Last updated at Posted at 2025-02-07

1. きっかけ

前回の投稿でコマンドラインで学習することが可能になりました。
しかし、引数で渡せず、yamlファイルを用意しなければいけないパラメータがあったりする点が面倒なので、Notebook形式で学習できる方法を探ります。


MacBookAir(2025年1月初売りで購入でウキウキ)
CPU:M3
ユニファイドメモリー:24GB
SSD:1T

バージョン関連
Python==3.11.8
mlx_lm==0.21.1

使用したモデルと学習データ
モデル:llm-jp/llm-jp-3-3.7b
  (インストラクションチューニングされていないモデル)
学習データ:llm-jp/magpie-sft-v1.0 からランダムに2000データを使用


コマンドラインでの学習の時と、モデルと学習データを変えました。理由は以下の通りです。

  • メモリーのスワップがひどすぎた
  • 小さなモデルなのでシンプルなインストラクションチューニングで効果をわかりやすくしたい。
  • Qiitaを書くまでの時間を少なくしたい。w



2. GitHubの熟読

久々にargparseを使うので、復習をしました。そんなレベルの人間です。
まず、ガッツリ、lora.pyを読みました。そのなかで着目したのはmain()関数です。lora.pyを呼んだらこの関数が呼ばれる様になっているのが着目理由です。

ざっと説明すると、設定を読み込んで、run関数に引数を渡しています。

main関数(簡単な解説)
def main():
    os.environ["TOKENIZERS_PARALLELISM"] = "true"
    parser = build_parser() # コマンドライン引数の処理
    args = parser.parse_args()
    config = args.config # コマンドライン引数のconfigの処理
    args = vars(args)
    if config: # コマンドライン引数でconfigを受け取ったら、そのyamlファイルの読み込み
        print("Loading configuration file", config)
        with open(config, "r") as file:
            config = yaml.load(file, yaml_loader)
        # Prefer parameters from command-line arguments
        for k, v in config.items():
            if args.get(k, None) is None:
                args[k] = v

    # Update defaults for unspecified parameters
    for k, v in CONFIG_DEFAULTS.items(): # 文字通り、指定されていないデフォルト設定の読み込み。
        if args.get(k, None) is None:
            args[k] = v
    run(types.SimpleNamespace(**args)) # ここでrun関数に引数を渡す。



3. 作戦

上のmain関数の中にはyamlファイルを読み込むことが書かれています。読み込んだ内容と、指定していない項目はCONFIG_DEFAULTSから読み取って、argsを作っている。
つまりCONFIG_DEFAULTSの内容を何かの変数にいれて、この変数をrunに渡せばいいってことじゃないですか!

決定!これで行こう


mlx_lmのGitHubにはyamlファイルのサンプルがありますので、CONFIG_DEFAULTSに記載されていなくて、サンプルのyamlファイルにあるパラメータを探しましょう。


学習する層を指定して(絞って)あげたいのと、学習率のスケジューラを使おうかな。

lora_config.yamlから抜粋
# LoRA parameters can only be specified in a config file
lora_parameters:
  # The layer keys to apply LoRA to.
  # These will be applied for the last lora_layers
  keys: ["self_attn.q_proj", "self_attn.v_proj"]
  rank: 8
  scale: 20.0
  dropout: 0.0
  
# Schedule can only be specified in a config file, uncomment to use.
#lr_schedule:
#  name: cosine_decay
#  warmup: 100 # 0 for no warmup
#  warmup_init: 1e-7 # 0 if not specified
#  arguments: [1e-5, 1000, 1e-7] # passed to scheduler  

学習する層を設定するのはlora_parameterskeysになりそうです。これをアテンション層のQKV層のみを学習して、インストラクションチューニングしましょう。


モデルをプリントすると以下のような出力が出ます。カッコの中に書いてあるself_attenとか、q_projなどが層の名前になります。これを指定する場合はself_atten.q_projとすればいいです。この辺りはhuggingfaceやunslothのlora、PEFTなどの解説を読めばわかるはず。

Model(
  (model): LlamaModel(
    (embed_tokens): QuantizedEmbedding(99584, 3072, group_size=64, bits=4)
    (layers.0): TransformerBlock(
      (self_attn): Attention(
        (q_proj): QuantizedLinear(input_dims=3072, output_dims=3072, bias=False, group_size=64, bits=4)
        (k_proj): QuantizedLinear(input_dims=3072, output_dims=3072, bias=False, group_size=64, bits=4)
        (v_proj): QuantizedLinear(input_dims=3072, output_dims=3072, bias=False, group_size=64, bits=4)
        (o_proj): QuantizedLinear(input_dims=3072, output_dims=3072, bias=False, group_size=64, bits=4)
        (rope): RoPE(128, traditional=False)
      )
      (mlp): MLP(
        (gate_proj): QuantizedLinear(input_dims=3072, output_dims=8192, bias=False, group_size=64, bits=4)
        (down_proj): QuantizedLinear(input_dims=8192, output_dims=3072, bias=False, group_size=64, bits=4)
        (up_proj): QuantizedLinear(input_dims=3072, output_dims=8192, bias=False, group_size=64, bits=4)
      )
      (input_layernorm): RMSNorm(3072, eps=1e-05)
      (post_attention_layernorm): RMSNorm(3072, eps=1e-05)
    )
    (layers.1): TransformerBlock(
      (self_attn): Attention(
        (q_proj): QuantizedLinear(input_dims=3072, output_dims=3072, bias=False, group_size=64, bits=4)
        (k_proj): QuantizedLinear(input_dims=3072, output_dims=3072, bias=False, group_size=64, bits=4)
        (v_proj): QuantizedLinear(input_dims=3072, output_dims=3072, bias=False, group_size=64, bits=4)
        (o_proj): QuantizedLinear(input_dims=3072, output_dims=3072, bias=False, group_size=64, bits=4)
        
< 以下省略 >



4. パラメータを設定する

4-1. デフォルト値の読み込み

デフォルト値の読み込み
from mlx_lm import lora 

training_args = lora.CONFIG_DEFAULTS # デフォルト引数をargsという変数に格納
print(training_args)
# {'model': 'mlx_model',
#  'train': False,
#  'fine_tune_type': 'lora',
#  'data': 'data/',
#  'seed': 0,
#  'num_layers': 16,
#  'batch_size': 4,
#  'iters': 1000,
#  'val_batches': 25,
#  'learning_rate': 1e-05,
#  'steps_per_report': 10,
#  'steps_per_eval': 200,
#  'resume_adapter_file': None,
#  'adapter_path': 'adapters',
#  'save_every': 100,
#  'test': False,
#  'test_batches': 500,
#  'max_seq_length': 2048,
#  'config': None,
#  'grad_checkpoint': False,
#  'lr_schedule': None,
#  'lora_parameters': {'rank': 8, 'alpha': 16, 'dropout': 0.0, 'scale': 10.0}}

確認出来たらこれらの設定を変更します。


4-2. パラメータの設定

args変数にいれたデフォルトののパラメータを更新しちゃいましょう。

argsパラメータ設定
mlx_path = "llm-jp-3-3.7b" # このフォルダに量子化したモデルを保存してあります。
dataset_path = "./magpie" # このフォルダに学習、評価、テストのjsonlを保存してあります。

training_args['model'] = mlx_path# モデルのレポジトリ名orローカルのフォルダパス
training_args['data'] = saved_dataset # データセットのパス
args['train'] = True # 学習モードオン
training_args['max_seq_length'] = 1024 * 2 # 推論トークン数 
training_args['iters'] = 1000 # 学習するミニバッチ数
training_args['batch_size'] = 2 # 学習率
training_args['learning_rate'] = 3e-04 # 学習率:大き目で効果を出やすく
training_args['steps_per_report'] = 5 # 何ステップごとに学習状態を表示するか
training_args['val_batches'] = 2 # 評価バッチ数
training_args['test_batches'] = 2 # テストバッチ数

# ここから下がコマンドラインでの学習ではyamlファイルで設定しなければならなかった項目
training_args['lora_parameters'] =  {
    'rank': 8, 
    'alpha': 64, # ここも効果が出やすいように大きくしました
    'dropout': 0.2, 
    'scale': 10.0, 
    'keys': [                    # ここを設定したかった!
        "self_attn.q_proj",      # modelをプリントして、
        "self_attn.k_proj",      # 学習したい層を指定しましょう
        "self_attn.v_proj"
    ]}
args['lr_schedule'] = { # 学習率のスケジュール設定(まだ詳細はわかっていない💦)
    'name': 'cosine_decay', 
    'warmup': 50, 
    'warmup_init': 1e-7, 
    'arguments': [3e-5, 1000, 1e-7], 
}

print(training_args)

# {'model': 'llm-jp-3-3.7b',
#  'train': True,
#  'fine_tune_type': 'lora',
#  'data': './magpie',
#  'seed': 0,
#  'num_layers': 16,
#  'batch_size': 2,
#  'iters': 1000,
#  'val_batches': 2,
#  'learning_rate': 0.0003,
#  'steps_per_report': 5,
#  'steps_per_eval': 200,
#  'resume_adapter_file': None,
#  'adapter_path': 'adapters',
#  'save_every': 100,
#  'test': False,
#  'test_batches': 2,
#  'max_seq_length': 2048,
#  'config': None,
#  'grad_checkpoint': False,
#  'lr_schedule': {'name': 'cosine_decay',
#   'warmup': 50,
#   'warmup_init': 1e-07,
#   'arguments': [3e-05, 1000, 1e-07]},
#  'lora_parameters': {'rank': 8,
#   'alpha': 64,
#   'dropout': 0.2,
#   'scale': 10.0,
#   'keys': ['self_attn.q_proj', 'self_attn.k_proj', 'self_attn.v_proj']}}

うっす、設定ができた!
じゃ、学習に移ろう。


【追記】
itersは「学習するミニバッチをいくつ学習するかという数」だと思います。↓
https://github.com/ml-explore/mlx-examples/blob/52c41b5b5abfdd4ee1c35bd362162b1dc7a62138/llms/mlx_lm/tuner/trainer.py#L223
ご意見いただけると嬉しいです。



5. 学習の実行

いよいよです。伊予は愛媛です。はい。つっこんで。
学習の実行!

学習の実行
import types

lora.run(types.SimpleNamespace(**training_args))

作った辞書をjsonみたいな形式に変換してからlora.pyのrun関数に渡すだけ。
ここはlora.pyのmain関数を参考にしました。


学習時のアウトプット
output

Loading pretrained model
Loading datasets
Training
Trainable parameters: 0.109% (4.129M/3782.913M)
Starting training..., iters: 1000
Iter 1: Val loss 2.134, Val took 3.796s
Iter 5: Train loss 2.155, Learning Rate 2.492e-06, It/sec 0.174, Tokens/sec 103.054, Trained Tokens 2958, Peak mem 8.977 GB
Iter 10: Train loss 1.985, Learning Rate 5.482e-06, It/sec 0.117, Tokens/sec 90.683, Trained Tokens 6840, Peak mem 10.557 GB
Iter 15: Train loss 2.566, Learning Rate 8.472e-06, It/sec 0.206, Tokens/sec 98.132, Trained Tokens 9223, Peak mem 10.806 GB
Iter 20: Train loss 2.073, Learning Rate 1.146e-05, It/sec 0.159, Tokens/sec 100.857, Trained Tokens 12403, Peak mem 10.806 GB
Iter 25: Train loss 2.583, Learning Rate 1.445e-05, It/sec 0.235, Tokens/sec 96.982, Trained Tokens 14470, Peak mem 10.806 GB
Iter 30: Train loss 1.880, Learning Rate 1.744e-05, It/sec 0.146, Tokens/sec 98.015, Trained Tokens 17817, Peak mem 10.806 GB
Iter 35: Train loss 1.967, Learning Rate 2.043e-05, It/sec 0.202, Tokens/sec 98.188, Trained Tokens 20245, Peak mem 10.806 GB
Iter 40: Train loss 1.807, Learning Rate 2.342e-05, It/sec 0.143, Tokens/sec 98.483, Trained Tokens 23691, Peak mem 10.806 GB
Iter 45: Train loss 1.956, Learning Rate 2.641e-05, It/sec 0.161, Tokens/sec 97.086, Trained Tokens 26714, Peak mem 13.110 GB
Iter 50: Train loss 1.891, Learning Rate 2.940e-05, It/sec 0.202, Tokens/sec 99.258, Trained Tokens 29174, Peak mem 13.110 GB
Iter 55: Train loss 1.800, Learning Rate 3.000e-05, It/sec 0.139, Tokens/sec 97.348, Trained Tokens 32671, Peak mem 13.110 GB
Iter 60: Train loss 1.659, Learning Rate 3.000e-05, It/sec 0.189, Tokens/sec 98.076, Trained Tokens 35260, Peak mem 13.110 GB
Iter 65: Train loss 1.637, Learning Rate 2.999e-05, It/sec 0.123, Tokens/sec 97.735, Trained Tokens 39223, Peak mem 13.110 GB
Iter 70: Train loss 1.655, Learning Rate 2.998e-05, It/sec 0.178, Tokens/sec 99.309, Trained Tokens 42007, Peak mem 13.110 GB
Iter 75: Train loss 1.628, Learning Rate 2.996e-05, It/sec 0.133, Tokens/sec 96.343, Trained Tokens 45625, Peak mem 13.242 GB
Iter 80: Train loss 1.720, Learning Rate 2.994e-05, It/sec 0.153, Tokens/sec 97.010, Trained Tokens 48795, Peak mem 13.242 GB
Iter 85: Train loss 1.730, Learning Rate 2.992e-05, It/sec 0.163, Tokens/sec 96.918, Trained Tokens 51768, Peak mem 13.242 GB
Iter 90: Train loss 1.591, Learning Rate 2.989e-05, It/sec 0.170, Tokens/sec 97.040, Trained Tokens 54624, Peak mem 13.242 GB
Iter 95: Train loss 1.742, Learning Rate 2.986e-05, It/sec 0.142, Tokens/sec 95.143, Trained Tokens 57983, Peak mem 13.242 GB
Iter 100: Train loss 1.597, Learning Rate 2.983e-05, It/sec 0.114, Tokens/sec 94.004, Trained Tokens 62099, Peak mem 14.577 GB
Iter 100: Saved adapter weights to adapters/adapters.safetensors and adapters/0000100_adapters.safetensors.
Iter 105: Train loss 1.621, Learning Rate 2.979e-05, It/sec 0.141, Tokens/sec 96.370, Trained Tokens 65520, Peak mem 14.577 GB
Iter 110: Train loss 1.643, Learning Rate 2.975e-05, It/sec 0.119, Tokens/sec 93.920, Trained Tokens 69466, Peak mem 14.577 GB
Iter 115: Train loss 1.670, Learning Rate 2.971e-05, It/sec 0.137, Tokens/sec 94.276, Trained Tokens 72913, Peak mem 14.577 GB
Iter 120: Train loss 1.734, Learning Rate 2.966e-05, It/sec 0.160, Tokens/sec 96.645, Trained Tokens 75940, Peak mem 14.577 GB
Iter 125: Train loss 1.613, Learning Rate 2.961e-05, It/sec 0.124, Tokens/sec 94.691, Trained Tokens 79745, Peak mem 14.729 GB
Iter 130: Train loss 1.614, Learning Rate 2.955e-05, It/sec 0.168, Tokens/sec 94.766, Trained Tokens 82560, Peak mem 14.729 GB
Iter 135: Train loss 1.898, Learning Rate 2.949e-05, It/sec 0.275, Tokens/sec 96.648, Trained Tokens 84320, Peak mem 14.729 GB
Iter 140: Train loss 1.631, Learning Rate 2.943e-05, It/sec 0.138, Tokens/sec 96.271, Trained Tokens 87797, Peak mem 14.729 GB
Iter 145: Train loss 1.609, Learning Rate 2.937e-05, It/sec 0.135, Tokens/sec 96.187, Trained Tokens 91364, Peak mem 14.729 GB
Iter 150: Train loss 1.610, Learning Rate 2.930e-05, It/sec 0.164, Tokens/sec 96.281, Trained Tokens 94304, Peak mem 14.729 GB
Iter 155: Train loss 1.601, Learning Rate 2.922e-05, It/sec 0.150, Tokens/sec 96.561, Trained Tokens 97532, Peak mem 14.729 GB
Iter 160: Train loss 1.443, Learning Rate 2.915e-05, It/sec 0.125, Tokens/sec 95.804, Trained Tokens 101368, Peak mem 14.729 GB
Iter 165: Train loss 1.593, Learning Rate 2.907e-05, It/sec 0.149, Tokens/sec 96.789, Trained Tokens 104608, Peak mem 14.729 GB
Iter 170: Train loss 1.575, Learning Rate 2.898e-05, It/sec 0.147, Tokens/sec 94.612, Trained Tokens 107831, Peak mem 14.729 GB
Iter 175: Train loss 1.734, Learning Rate 2.890e-05, It/sec 0.173, Tokens/sec 94.938, Trained Tokens 110579, Peak mem 14.729 GB
Iter 180: Train loss 1.562, Learning Rate 2.881e-05, It/sec 0.150, Tokens/sec 95.506, Trained Tokens 113754, Peak mem 14.729 GB
Iter 185: Train loss 1.561, Learning Rate 2.871e-05, It/sec 0.179, Tokens/sec 97.174, Trained Tokens 116467, Peak mem 14.729 GB
Iter 190: Train loss 1.595, Learning Rate 2.862e-05, It/sec 0.159, Tokens/sec 95.771, Trained Tokens 119483, Peak mem 14.729 GB
Iter 195: Train loss 1.602, Learning Rate 2.852e-05, It/sec 0.155, Tokens/sec 95.615, Trained Tokens 122571, Peak mem 14.729 GB
Iter 200: Val loss 1.420, Val took 3.997s
Iter 200: Train loss 1.687, Learning Rate 2.841e-05, It/sec 0.945, Tokens/sec 439.152, Trained Tokens 124895, Peak mem 14.729 GB
Iter 200: Saved adapter weights to adapters/adapters.safetensors and adapters/0000200_adapters.safetensors.
Iter 205: Train loss 1.636, Learning Rate 2.831e-05, It/sec 0.145, Tokens/sec 93.273, Trained Tokens 128112, Peak mem 14.729 GB
Iter 210: Train loss 1.679, Learning Rate 2.820e-05, It/sec 0.178, Tokens/sec 94.294, Trained Tokens 130755, Peak mem 14.729 GB
Iter 215: Train loss 1.703, Learning Rate 2.808e-05, It/sec 0.201, Tokens/sec 94.923, Trained Tokens 133111, Peak mem 14.729 GB
Iter 220: Train loss 1.593, Learning Rate 2.797e-05, It/sec 0.198, Tokens/sec 94.405, Trained Tokens 135492, Peak mem 14.729 GB
Iter 225: Train loss 1.592, Learning Rate 2.785e-05, It/sec 0.149, Tokens/sec 88.982, Trained Tokens 138483, Peak mem 14.729 GB
Iter 230: Train loss 1.485, Learning Rate 2.772e-05, It/sec 0.114, Tokens/sec 89.521, Trained Tokens 142419, Peak mem 17.518 GB
Iter 235: Train loss 1.600, Learning Rate 2.760e-05, It/sec 0.168, Tokens/sec 95.527, Trained Tokens 145265, Peak mem 17.518 GB
Iter 240: Train loss 1.664, Learning Rate 2.747e-05, It/sec 0.199, Tokens/sec 95.905, Trained Tokens 147672, Peak mem 17.518 GB
Iter 245: Train loss 1.650, Learning Rate 2.734e-05, It/sec 0.139, Tokens/sec 92.882, Trained Tokens 151005, Peak mem 17.518 GB
Iter 250: Train loss 1.705, Learning Rate 2.720e-05, It/sec 0.181, Tokens/sec 95.493, Trained Tokens 153638, Peak mem 17.518 GB
Iter 255: Train loss 1.590, Learning Rate 2.706e-05, It/sec 0.143, Tokens/sec 93.775, Trained Tokens 156926, Peak mem 17.518 GB
Iter 260: Train loss 1.561, Learning Rate 2.692e-05, It/sec 0.125, Tokens/sec 89.708, Trained Tokens 160510, Peak mem 19.409 GB
Iter 265: Train loss 1.501, Learning Rate 2.678e-05, It/sec 0.129, Tokens/sec 95.030, Trained Tokens 164205, Peak mem 19.409 GB
Iter 270: Train loss 1.525, Learning Rate 2.663e-05, It/sec 0.159, Tokens/sec 97.161, Trained Tokens 167267, Peak mem 19.409 GB
Iter 275: Train loss 1.733, Learning Rate 2.648e-05, It/sec 0.238, Tokens/sec 96.977, Trained Tokens 169307, Peak mem 19.409 GB
Iter 280: Train loss 1.503, Learning Rate 2.633e-05, It/sec 0.128, Tokens/sec 93.758, Trained Tokens 172966, Peak mem 19.409 GB
Iter 285: Train loss 1.545, Learning Rate 2.617e-05, It/sec 0.247, Tokens/sec 95.755, Trained Tokens 174901, Peak mem 19.409 GB
Iter 290: Train loss 1.562, Learning Rate 2.601e-05, It/sec 0.129, Tokens/sec 96.307, Trained Tokens 178648, Peak mem 19.409 GB
Iter 295: Train loss 1.504, Learning Rate 2.585e-05, It/sec 0.146, Tokens/sec 94.888, Trained Tokens 181900, Peak mem 19.409 GB
Iter 300: Train loss 1.556, Learning Rate 2.569e-05, It/sec 0.117, Tokens/sec 94.450, Trained Tokens 185924, Peak mem 19.409 GB
Iter 300: Saved adapter weights to adapters/adapters.safetensors and adapters/0000300_adapters.safetensors.
Iter 305: Train loss 1.679, Learning Rate 2.552e-05, It/sec 0.187, Tokens/sec 96.315, Trained Tokens 188495, Peak mem 19.409 GB
Iter 310: Train loss 1.750, Learning Rate 2.535e-05, It/sec 0.237, Tokens/sec 96.388, Trained Tokens 190525, Peak mem 19.409 GB
Iter 315: Train loss 1.632, Learning Rate 2.518e-05, It/sec 0.153, Tokens/sec 94.678, Trained Tokens 193620, Peak mem 19.409 GB
Iter 320: Train loss 1.723, Learning Rate 2.501e-05, It/sec 0.201, Tokens/sec 95.904, Trained Tokens 196005, Peak mem 19.409 GB
Iter 325: Train loss 1.562, Learning Rate 2.483e-05, It/sec 0.176, Tokens/sec 94.201, Trained Tokens 198682, Peak mem 19.409 GB
Iter 330: Train loss 1.739, Learning Rate 2.465e-05, It/sec 0.188, Tokens/sec 95.992, Trained Tokens 201235, Peak mem 19.409 GB
Iter 335: Train loss 1.521, Learning Rate 2.447e-05, It/sec 0.140, Tokens/sec 95.354, Trained Tokens 204643, Peak mem 19.409 GB
Iter 340: Train loss 1.592, Learning Rate 2.429e-05, It/sec 0.126, Tokens/sec 94.949, Trained Tokens 208407, Peak mem 19.409 GB
Iter 345: Train loss 1.643, Learning Rate 2.410e-05, It/sec 0.218, Tokens/sec 97.252, Trained Tokens 210637, Peak mem 19.409 GB
Iter 350: Train loss 1.603, Learning Rate 2.391e-05, It/sec 0.165, Tokens/sec 95.979, Trained Tokens 213549, Peak mem 19.409 GB
Iter 355: Train loss 1.541, Learning Rate 2.372e-05, It/sec 0.180, Tokens/sec 93.085, Trained Tokens 216137, Peak mem 19.409 GB
Iter 360: Train loss 1.554, Learning Rate 2.353e-05, It/sec 0.164, Tokens/sec 93.667, Trained Tokens 218991, Peak mem 19.409 GB
Iter 365: Train loss 1.615, Learning Rate 2.334e-05, It/sec 0.139, Tokens/sec 93.843, Trained Tokens 222363, Peak mem 19.409 GB
Iter 370: Train loss 1.655, Learning Rate 2.314e-05, It/sec 0.133, Tokens/sec 91.408, Trained Tokens 225792, Peak mem 19.409 GB
Iter 375: Train loss 1.738, Learning Rate 2.294e-05, It/sec 0.232, Tokens/sec 97.037, Trained Tokens 227884, Peak mem 19.409 GB
Iter 380: Train loss 1.617, Learning Rate 2.274e-05, It/sec 0.162, Tokens/sec 93.658, Trained Tokens 230775, Peak mem 19.409 GB
Iter 385: Train loss 1.650, Learning Rate 2.254e-05, It/sec 0.174, Tokens/sec 94.103, Trained Tokens 233479, Peak mem 19.409 GB
Iter 390: Train loss 1.535, Learning Rate 2.233e-05, It/sec 0.130, Tokens/sec 94.952, Trained Tokens 237132, Peak mem 19.409 GB
Iter 395: Train loss 1.589, Learning Rate 2.213e-05, It/sec 0.153, Tokens/sec 95.416, Trained Tokens 240248, Peak mem 19.409 GB
Iter 400: Val loss 1.449, Val took 3.346s
Iter 400: Train loss 1.713, Learning Rate 2.192e-05, It/sec 0.756, Tokens/sec 436.836, Trained Tokens 243139, Peak mem 19.409 GB
Iter 400: Saved adapter weights to adapters/adapters.safetensors and adapters/0000400_adapters.safetensors.
Iter 405: Train loss 1.553, Learning Rate 2.171e-05, It/sec 0.140, Tokens/sec 94.944, Trained Tokens 246519, Peak mem 19.409 GB
Iter 410: Train loss 1.574, Learning Rate 2.150e-05, It/sec 0.161, Tokens/sec 94.041, Trained Tokens 249434, Peak mem 19.409 GB
Iter 415: Train loss 1.720, Learning Rate 2.129e-05, It/sec 0.179, Tokens/sec 95.309, Trained Tokens 252092, Peak mem 19.409 GB
Iter 420: Train loss 1.566, Learning Rate 2.107e-05, It/sec 0.144, Tokens/sec 94.054, Trained Tokens 255359, Peak mem 19.409 GB
Iter 425: Train loss 1.602, Learning Rate 2.086e-05, It/sec 0.177, Tokens/sec 95.213, Trained Tokens 258055, Peak mem 19.409 GB
Iter 430: Train loss 1.427, Learning Rate 2.064e-05, It/sec 0.130, Tokens/sec 94.972, Trained Tokens 261712, Peak mem 19.409 GB
Iter 435: Train loss 1.637, Learning Rate 2.042e-05, It/sec 0.160, Tokens/sec 95.002, Trained Tokens 264672, Peak mem 19.409 GB
Iter 440: Train loss 1.650, Learning Rate 2.020e-05, It/sec 0.176, Tokens/sec 95.570, Trained Tokens 267394, Peak mem 19.409 GB
Iter 445: Train loss 1.504, Learning Rate 1.998e-05, It/sec 0.157, Tokens/sec 94.528, Trained Tokens 270401, Peak mem 19.409 GB
Iter 450: Train loss 1.565, Learning Rate 1.976e-05, It/sec 0.166, Tokens/sec 96.060, Trained Tokens 273300, Peak mem 19.409 GB
Iter 455: Train loss 1.728, Learning Rate 1.954e-05, It/sec 0.218, Tokens/sec 96.532, Trained Tokens 275512, Peak mem 19.409 GB
Iter 460: Train loss 1.639, Learning Rate 1.931e-05, It/sec 0.179, Tokens/sec 93.604, Trained Tokens 278132, Peak mem 19.409 GB
Iter 465: Train loss 1.597, Learning Rate 1.909e-05, It/sec 0.163, Tokens/sec 95.533, Trained Tokens 281056, Peak mem 19.409 GB
Iter 470: Train loss 1.522, Learning Rate 1.886e-05, It/sec 0.153, Tokens/sec 98.242, Trained Tokens 284271, Peak mem 19.409 GB
Iter 475: Train loss 1.497, Learning Rate 1.863e-05, It/sec 0.173, Tokens/sec 98.682, Trained Tokens 287116, Peak mem 19.409 GB
Iter 480: Train loss 1.492, Learning Rate 1.840e-05, It/sec 0.171, Tokens/sec 94.965, Trained Tokens 289887, Peak mem 19.409 GB
Iter 485: Train loss 1.516, Learning Rate 1.817e-05, It/sec 0.160, Tokens/sec 94.132, Trained Tokens 292837, Peak mem 19.409 GB
Iter 490: Train loss 1.615, Learning Rate 1.794e-05, It/sec 0.199, Tokens/sec 93.854, Trained Tokens 295192, Peak mem 19.409 GB
Iter 495: Train loss 1.520, Learning Rate 1.771e-05, It/sec 0.127, Tokens/sec 93.321, Trained Tokens 298877, Peak mem 19.409 GB
Iter 500: Train loss 1.624, Learning Rate 1.748e-05, It/sec 0.158, Tokens/sec 90.712, Trained Tokens 301754, Peak mem 19.409 GB
Iter 500: Saved adapter weights to adapters/adapters.safetensors and adapters/0000500_adapters.safetensors.
Iter 505: Train loss 1.554, Learning Rate 1.725e-05, It/sec 0.181, Tokens/sec 95.226, Trained Tokens 304379, Peak mem 19.409 GB
Iter 510: Train loss 1.873, Learning Rate 1.702e-05, It/sec 0.282, Tokens/sec 92.496, Trained Tokens 306019, Peak mem 19.409 GB
Iter 515: Train loss 1.670, Learning Rate 1.678e-05, It/sec 0.178, Tokens/sec 94.885, Trained Tokens 308686, Peak mem 19.409 GB
Iter 520: Train loss 1.750, Learning Rate 1.655e-05, It/sec 0.177, Tokens/sec 94.764, Trained Tokens 311358, Peak mem 19.409 GB
Iter 525: Train loss 1.571, Learning Rate 1.632e-05, It/sec 0.183, Tokens/sec 95.119, Trained Tokens 313958, Peak mem 19.409 GB
Iter 530: Train loss 1.442, Learning Rate 1.608e-05, It/sec 0.127, Tokens/sec 93.275, Trained Tokens 317621, Peak mem 19.409 GB
Iter 535: Train loss 1.624, Learning Rate 1.585e-05, It/sec 0.164, Tokens/sec 94.661, Trained Tokens 320504, Peak mem 19.409 GB
Iter 540: Train loss 1.778, Learning Rate 1.561e-05, It/sec 0.194, Tokens/sec 94.572, Trained Tokens 322937, Peak mem 19.409 GB
Iter 545: Train loss 1.542, Learning Rate 1.538e-05, It/sec 0.138, Tokens/sec 93.897, Trained Tokens 326348, Peak mem 19.409 GB
Iter 550: Train loss 1.513, Learning Rate 1.514e-05, It/sec 0.156, Tokens/sec 95.806, Trained Tokens 329422, Peak mem 19.409 GB
Iter 555: Train loss 1.542, Learning Rate 1.491e-05, It/sec 0.151, Tokens/sec 94.013, Trained Tokens 332534, Peak mem 19.409 GB
Iter 560: Train loss 1.576, Learning Rate 1.467e-05, It/sec 0.192, Tokens/sec 95.091, Trained Tokens 335015, Peak mem 19.409 GB
Iter 565: Train loss 1.649, Learning Rate 1.444e-05, It/sec 0.196, Tokens/sec 93.023, Trained Tokens 337383, Peak mem 19.409 GB
Iter 570: Train loss 1.542, Learning Rate 1.421e-05, It/sec 0.137, Tokens/sec 93.244, Trained Tokens 340785, Peak mem 19.409 GB
Iter 575: Train loss 1.580, Learning Rate 1.397e-05, It/sec 0.166, Tokens/sec 94.918, Trained Tokens 343651, Peak mem 19.409 GB
Iter 580: Train loss 1.617, Learning Rate 1.374e-05, It/sec 0.159, Tokens/sec 95.595, Trained Tokens 346648, Peak mem 19.409 GB
Iter 585: Train loss 1.630, Learning Rate 1.350e-05, It/sec 0.158, Tokens/sec 95.399, Trained Tokens 349675, Peak mem 19.409 GB
Iter 590: Train loss 1.466, Learning Rate 1.327e-05, It/sec 0.124, Tokens/sec 93.409, Trained Tokens 353452, Peak mem 19.409 GB
Iter 595: Train loss 1.601, Learning Rate 1.304e-05, It/sec 0.165, Tokens/sec 95.796, Trained Tokens 356349, Peak mem 19.409 GB
Iter 600: Val loss 1.300, Val took 5.694s
Iter 600: Train loss 1.501, Learning Rate 1.280e-05, It/sec 0.794, Tokens/sec 547.417, Trained Tokens 359795, Peak mem 19.409 GB
Iter 600: Saved adapter weights to adapters/adapters.safetensors and adapters/0000600_adapters.safetensors.
Iter 605: Train loss 1.546, Learning Rate 1.257e-05, It/sec 0.145, Tokens/sec 93.975, Trained Tokens 363031, Peak mem 19.409 GB
Iter 610: Train loss 1.560, Learning Rate 1.234e-05, It/sec 0.141, Tokens/sec 93.563, Trained Tokens 366356, Peak mem 19.409 GB
Iter 615: Train loss 1.640, Learning Rate 1.211e-05, It/sec 0.184, Tokens/sec 93.823, Trained Tokens 368902, Peak mem 19.409 GB
Iter 620: Train loss 1.571, Learning Rate 1.188e-05, It/sec 0.140, Tokens/sec 94.883, Trained Tokens 372296, Peak mem 19.409 GB
Iter 625: Train loss 1.496, Learning Rate 1.165e-05, It/sec 0.128, Tokens/sec 93.654, Trained Tokens 375947, Peak mem 19.409 GB
Iter 630: Train loss 1.475, Learning Rate 1.142e-05, It/sec 0.179, Tokens/sec 95.299, Trained Tokens 378604, Peak mem 19.409 GB
Iter 635: Train loss 1.490, Learning Rate 1.120e-05, It/sec 0.148, Tokens/sec 93.163, Trained Tokens 381757, Peak mem 19.409 GB
Iter 640: Train loss 1.498, Learning Rate 1.097e-05, It/sec 0.127, Tokens/sec 93.899, Trained Tokens 385467, Peak mem 19.409 GB
Iter 645: Train loss 1.708, Learning Rate 1.074e-05, It/sec 0.154, Tokens/sec 95.691, Trained Tokens 388568, Peak mem 19.409 GB
Iter 650: Train loss 1.498, Learning Rate 1.052e-05, It/sec 0.189, Tokens/sec 107.797, Trained Tokens 391414, Peak mem 19.409 GB
Iter 655: Train loss 1.527, Learning Rate 1.030e-05, It/sec 0.178, Tokens/sec 106.256, Trained Tokens 394395, Peak mem 19.409 GB
Iter 660: Train loss 1.402, Learning Rate 1.007e-05, It/sec 0.169, Tokens/sec 104.640, Trained Tokens 397494, Peak mem 19.409 GB
Iter 665: Train loss 1.652, Learning Rate 9.854e-06, It/sec 0.192, Tokens/sec 104.114, Trained Tokens 400211, Peak mem 19.409 GB
Iter 670: Train loss 1.616, Learning Rate 9.634e-06, It/sec 0.138, Tokens/sec 103.355, Trained Tokens 403956, Peak mem 19.409 GB
Iter 675: Train loss 1.578, Learning Rate 9.416e-06, It/sec 0.138, Tokens/sec 103.449, Trained Tokens 407691, Peak mem 19.409 GB
Iter 680: Train loss 1.685, Learning Rate 9.199e-06, It/sec 0.199, Tokens/sec 104.949, Trained Tokens 410333, Peak mem 19.409 GB
Iter 685: Train loss 1.616, Learning Rate 8.984e-06, It/sec 0.188, Tokens/sec 104.678, Trained Tokens 413117, Peak mem 19.409 GB
Iter 690: Train loss 1.678, Learning Rate 8.770e-06, It/sec 0.176, Tokens/sec 102.929, Trained Tokens 416040, Peak mem 19.409 GB
Iter 695: Train loss 1.567, Learning Rate 8.557e-06, It/sec 0.196, Tokens/sec 103.434, Trained Tokens 418685, Peak mem 19.409 GB
Iter 700: Train loss 1.521, Learning Rate 8.347e-06, It/sec 0.152, Tokens/sec 103.262, Trained Tokens 422092, Peak mem 19.409 GB
Iter 700: Saved adapter weights to adapters/adapters.safetensors and adapters/0000700_adapters.safetensors.
Iter 705: Train loss 1.473, Learning Rate 8.138e-06, It/sec 0.163, Tokens/sec 102.984, Trained Tokens 425245, Peak mem 19.409 GB
Iter 710: Train loss 1.617, Learning Rate 7.930e-06, It/sec 0.175, Tokens/sec 105.097, Trained Tokens 428250, Peak mem 19.409 GB
Iter 715: Train loss 1.491, Learning Rate 7.725e-06, It/sec 0.228, Tokens/sec 103.919, Trained Tokens 430530, Peak mem 19.409 GB
Iter 720: Train loss 1.595, Learning Rate 7.521e-06, It/sec 0.189, Tokens/sec 105.565, Trained Tokens 433324, Peak mem 19.409 GB
Iter 725: Train loss 1.565, Learning Rate 7.319e-06, It/sec 0.192, Tokens/sec 104.310, Trained Tokens 436040, Peak mem 19.409 GB
Iter 730: Train loss 1.574, Learning Rate 7.119e-06, It/sec 0.149, Tokens/sec 104.619, Trained Tokens 439541, Peak mem 19.409 GB
Iter 735: Train loss 1.521, Learning Rate 6.921e-06, It/sec 0.153, Tokens/sec 106.293, Trained Tokens 443020, Peak mem 19.409 GB
Iter 740: Train loss 1.538, Learning Rate 6.725e-06, It/sec 0.192, Tokens/sec 106.159, Trained Tokens 445779, Peak mem 19.409 GB
Iter 745: Train loss 1.436, Learning Rate 6.531e-06, It/sec 0.120, Tokens/sec 103.839, Trained Tokens 450121, Peak mem 19.409 GB
Iter 750: Train loss 1.545, Learning Rate 6.339e-06, It/sec 0.153, Tokens/sec 105.282, Trained Tokens 453563, Peak mem 19.409 GB
Iter 755: Train loss 1.572, Learning Rate 6.149e-06, It/sec 0.246, Tokens/sec 104.547, Trained Tokens 455689, Peak mem 19.409 GB
Iter 760: Train loss 1.577, Learning Rate 5.961e-06, It/sec 0.251, Tokens/sec 107.390, Trained Tokens 457825, Peak mem 19.409 GB
Iter 765: Train loss 1.485, Learning Rate 5.776e-06, It/sec 0.169, Tokens/sec 105.222, Trained Tokens 460935, Peak mem 19.409 GB
Iter 770: Train loss 1.637, Learning Rate 5.593e-06, It/sec 0.164, Tokens/sec 104.885, Trained Tokens 464126, Peak mem 19.409 GB
Iter 775: Train loss 1.515, Learning Rate 5.412e-06, It/sec 0.159, Tokens/sec 104.428, Trained Tokens 467414, Peak mem 19.409 GB
Iter 780: Train loss 1.644, Learning Rate 5.234e-06, It/sec 0.184, Tokens/sec 104.815, Trained Tokens 470266, Peak mem 19.409 GB
Iter 785: Train loss 1.723, Learning Rate 5.058e-06, It/sec 0.275, Tokens/sec 105.261, Trained Tokens 472183, Peak mem 19.409 GB
Iter 790: Train loss 1.607, Learning Rate 4.885e-06, It/sec 0.178, Tokens/sec 105.394, Trained Tokens 475150, Peak mem 19.409 GB
Iter 795: Train loss 1.381, Learning Rate 4.714e-06, It/sec 0.157, Tokens/sec 104.975, Trained Tokens 478494, Peak mem 19.409 GB
Iter 800: Val loss 1.429, Val took 6.000s
Iter 800: Train loss 1.482, Learning Rate 4.545e-06, It/sec 4.600, Tokens/sec 2204.487, Trained Tokens 480890, Peak mem 19.409 GB
Iter 800: Saved adapter weights to adapters/adapters.safetensors and adapters/0000800_adapters.safetensors.
Iter 805: Train loss 1.627, Learning Rate 4.380e-06, It/sec 0.230, Tokens/sec 106.581, Trained Tokens 483211, Peak mem 19.409 GB
Iter 810: Train loss 1.568, Learning Rate 4.216e-06, It/sec 0.175, Tokens/sec 106.507, Trained Tokens 486259, Peak mem 19.409 GB
Iter 815: Train loss 1.567, Learning Rate 4.056e-06, It/sec 0.170, Tokens/sec 104.837, Trained Tokens 489342, Peak mem 19.409 GB
Iter 820: Train loss 1.518, Learning Rate 3.898e-06, It/sec 0.129, Tokens/sec 103.856, Trained Tokens 493379, Peak mem 19.409 GB
Iter 825: Train loss 1.550, Learning Rate 3.743e-06, It/sec 0.246, Tokens/sec 106.081, Trained Tokens 495531, Peak mem 19.409 GB
Iter 830: Train loss 1.799, Learning Rate 3.591e-06, It/sec 0.339, Tokens/sec 103.669, Trained Tokens 497061, Peak mem 19.409 GB
Iter 835: Train loss 1.580, Learning Rate 3.442e-06, It/sec 0.183, Tokens/sec 104.315, Trained Tokens 499913, Peak mem 19.409 GB
Iter 840: Train loss 1.488, Learning Rate 3.295e-06, It/sec 0.185, Tokens/sec 105.085, Trained Tokens 502749, Peak mem 19.409 GB
Iter 845: Train loss 1.562, Learning Rate 3.151e-06, It/sec 0.204, Tokens/sec 106.284, Trained Tokens 505358, Peak mem 19.409 GB
Iter 850: Train loss 1.606, Learning Rate 3.011e-06, It/sec 0.199, Tokens/sec 105.009, Trained Tokens 507991, Peak mem 19.409 GB
Iter 855: Train loss 1.494, Learning Rate 2.873e-06, It/sec 0.198, Tokens/sec 106.569, Trained Tokens 510681, Peak mem 19.409 GB
Iter 860: Train loss 1.475, Learning Rate 2.738e-06, It/sec 0.170, Tokens/sec 105.755, Trained Tokens 513785, Peak mem 19.409 GB
Iter 865: Train loss 1.527, Learning Rate 2.606e-06, It/sec 0.203, Tokens/sec 104.990, Trained Tokens 516377, Peak mem 19.409 GB
Iter 870: Train loss 1.754, Learning Rate 2.478e-06, It/sec 0.148, Tokens/sec 103.745, Trained Tokens 519884, Peak mem 19.409 GB
Iter 875: Train loss 1.596, Learning Rate 2.352e-06, It/sec 0.154, Tokens/sec 105.054, Trained Tokens 523304, Peak mem 19.409 GB
Iter 880: Train loss 1.712, Learning Rate 2.230e-06, It/sec 0.191, Tokens/sec 105.442, Trained Tokens 526064, Peak mem 19.409 GB
Iter 885: Train loss 1.601, Learning Rate 2.111e-06, It/sec 0.167, Tokens/sec 103.711, Trained Tokens 529164, Peak mem 19.409 GB
Iter 890: Train loss 1.561, Learning Rate 1.995e-06, It/sec 0.205, Tokens/sec 105.706, Trained Tokens 531738, Peak mem 19.409 GB
Iter 895: Train loss 1.494, Learning Rate 1.882e-06, It/sec 0.217, Tokens/sec 105.305, Trained Tokens 534160, Peak mem 19.409 GB
Iter 900: Train loss 1.637, Learning Rate 1.772e-06, It/sec 0.195, Tokens/sec 104.027, Trained Tokens 536829, Peak mem 19.409 GB
Iter 900: Saved adapter weights to adapters/adapters.safetensors and adapters/0000900_adapters.safetensors.
Iter 905: Train loss 1.452, Learning Rate 1.666e-06, It/sec 0.153, Tokens/sec 103.404, Trained Tokens 540206, Peak mem 19.409 GB
Iter 910: Train loss 1.575, Learning Rate 1.563e-06, It/sec 0.220, Tokens/sec 106.515, Trained Tokens 542623, Peak mem 19.409 GB
Iter 915: Train loss 1.551, Learning Rate 1.463e-06, It/sec 0.173, Tokens/sec 105.034, Trained Tokens 545662, Peak mem 19.409 GB
Iter 920: Train loss 1.591, Learning Rate 1.367e-06, It/sec 0.189, Tokens/sec 104.461, Trained Tokens 548421, Peak mem 19.409 GB
Iter 925: Train loss 1.654, Learning Rate 1.274e-06, It/sec 0.169, Tokens/sec 105.159, Trained Tokens 551528, Peak mem 19.409 GB
Iter 930: Train loss 1.405, Learning Rate 1.185e-06, It/sec 0.171, Tokens/sec 105.275, Trained Tokens 554606, Peak mem 19.409 GB
Iter 935: Train loss 1.526, Learning Rate 1.099e-06, It/sec 0.167, Tokens/sec 105.054, Trained Tokens 557748, Peak mem 19.409 GB
Iter 940: Train loss 1.417, Learning Rate 1.016e-06, It/sec 0.145, Tokens/sec 95.212, Trained Tokens 561036, Peak mem 19.409 GB
Iter 945: Train loss 1.466, Learning Rate 9.367e-07, It/sec 0.157, Tokens/sec 96.806, Trained Tokens 564121, Peak mem 19.409 GB
Iter 950: Train loss 1.539, Learning Rate 8.610e-07, It/sec 0.139, Tokens/sec 95.188, Trained Tokens 567547, Peak mem 19.409 GB
Iter 955: Train loss 1.578, Learning Rate 7.888e-07, It/sec 0.154, Tokens/sec 97.753, Trained Tokens 570724, Peak mem 19.409 GB
Iter 960: Train loss 1.477, Learning Rate 7.201e-07, It/sec 0.156, Tokens/sec 96.405, Trained Tokens 573814, Peak mem 19.409 GB
Iter 965: Train loss 1.577, Learning Rate 6.549e-07, It/sec 0.161, Tokens/sec 96.825, Trained Tokens 576818, Peak mem 19.409 GB
Iter 970: Train loss 1.566, Learning Rate 5.933e-07, It/sec 0.167, Tokens/sec 97.113, Trained Tokens 579720, Peak mem 19.409 GB
Iter 975: Train loss 1.464, Learning Rate 5.353e-07, It/sec 0.175, Tokens/sec 96.903, Trained Tokens 582484, Peak mem 19.409 GB
Iter 980: Train loss 1.665, Learning Rate 4.808e-07, It/sec 0.152, Tokens/sec 97.262, Trained Tokens 585674, Peak mem 19.409 GB
Iter 985: Train loss 1.552, Learning Rate 4.300e-07, It/sec 0.207, Tokens/sec 97.032, Trained Tokens 588019, Peak mem 19.409 GB
Iter 990: Train loss 1.461, Learning Rate 3.827e-07, It/sec 0.161, Tokens/sec 97.454, Trained Tokens 591039, Peak mem 19.409 GB
Iter 995: Train loss 1.531, Learning Rate 3.391e-07, It/sec 0.183, Tokens/sec 98.384, Trained Tokens 593723, Peak mem 19.409 GB
Iter 1000: Val loss 1.440, Val took 5.109s
Iter 1000: Train loss 1.504, Learning Rate 2.990e-07, It/sec 0.823, Tokens/sec 447.697, Trained Tokens 596444, Peak mem 19.409 GB
Iter 1000: Saved adapter weights to adapters/adapters.safetensors and adapters/0001000_adapters.safetensors.
Saved final weights to adapters/adapters.safetensors.

できたできた!
でもここにたどり着くまで、ずいぶんとGitHubを眺めたのよ。うんうん
argparseの書き方も忘れてて復習したのよ。意外と、俺、頑張り屋(褒めて褒めて)
図1.png

しっかり学習するなら、パラメの設定やデータの準備をしっかりやしましょうね。



6. 学習結果

じゃ、結果を見ていきましょう。

ベースとしたモデルはllm-jp/llm-jp-3-3.7bです。これは事前学習しかされていないので、質問形式で入力してもまともに返答がかえってきません。本来は言葉をいれたらその続きの言葉を出力するモデルなのです。

6-1. 学習「前」の出力確認

まずは学習前(素のllm-jp/llm-jp-3-3.7bの4bit量子化したもの)

response = generate(
    model, 
    tokenizer, 
    prompt='# 指示\n大規模言語モデルって何?\n# 回答\n', 
    verbose=True
    )

# ==========
# 
# # 大規模言語モデルって何?
# 
# 大規模言語モデルとは、大量のデータを学習させることで言語の理解を可能にするモデルです。
# 
# # 大規模言語モデルって何?
# 
# 大規模言語モデルとは、大量のデータを学習させることで言語の理解を可能にするモデルです。
# 
# # 大規模言語モデルって何?
# 
#         < 中略 >
# 
# 大規模言語モデルとは、大量の
# ==========
# Prompt: 14 tokens, 113.487 tokens-per-sec
# Generation: 256 tokens, 40.249 tokens-per-sec
# Peak memory: 2.330 GB

本来であれば「大規模言語モデルとは、」と入力すると、この言葉の続きを出力するモデルです。今回は学習の効果がわかりやすくなるように、敢えてインストラクションチューニングモデルに入力するようにプロンプトを入れています。

要は 比較のために学習後と同じ言葉を入力 しているってこと。

そのため、出力がおかしくなり、同じ言葉の繰り返しが出力されています。


もう一つ試してみましょう。

response = generate(
    model, 
    tokenizer, 
    prompt='# 指示\n石破茂氏って何をしている人ですか\n# 回答\n', 
    verbose=True
    )

# ==========
# 
# 石破茂氏って何をしている人ですか?
# # 回答
# 
# # 回答
# 
# # 回答
# 
# 
#         < 中略 >
# 
# 
# # 回答
# 
# # 回答
# 
#
# ==========
# Prompt: 17 tokens, 137.744 tokens-per-sec
# Generation: 256 tokens, 39.532 tokens-per-sec
# Peak memory: 2.330 GB

こちらも同様の結果になっています。


6-2. 学習「後」の出力確認

では学習したモデルを使って試してみましょう。
まず、モデルの読み込み。

LoRAモデルとベースモデルの読み込み方
from mlx_lm import generate
from mlx_lm.utils import load

mlx_path = "llm-jp-3-3.7b/" # Q4モデルの保存フォルダのパス

model, tokenizer = load(
    mlx_path,
    adapter_path="./adapters", # LoRAモデルの保存フォルダのパス
)

モデルのロードができたら早速推論です。


推論1
response = generate(
    model, 
    tokenizer, 
    prompt='# 指示\n大規模言語モデルって何?\n# 回答\n', 
    verbose=True
    )

# ==========
# 大規模言語モデル(LLM)は、大量のテキストデータを学習して生成するモデルです。LLMは自然言語処理(NLP)の分野で広く使用されており、テキストの生成や翻訳、対話システムの構築などに利用されています。
# 
# LLMは主に以下の3つの要素から構成されています。
# 
# 1. **学習データ**: 大量のテキストデータ(コーパス)を使用してモデルを学習します。
# 
# 2. **モデル**: 学習されたモデルは大量のデータから特徴を抽出し、それらを組み合わせて新しいテキストを生成します。
# 
# 3. **生成器**: モデルから生成されたテキストを出力します。
# 
# LLMは非常に強力で、人間の言語理解能力に近いレベルの生成能力を持ちます。ただし、その生成結果は必ずしも正確ではなく、文脈や文脈に依存した表現を生成する能力は限定的です。
# 
# LLMは現在、自然言語処理の分野で非常に重要な役割を果たしており、今後も様々な分野で利用が拡大すると予想されています。
# ==========
# Prompt: 14 tokens, 78.951 tokens-per-sec
# Generation: 209 tokens, 35.392 tokens-per-sec
# Peak memory: 2.241 GB

ちゃんと質問に回答する形式に変化しているでしょ?
すんばらしい。ベースとなったモデルがいいんだね!

学習データは以下の構造になっています。

学習データ構造
# 指示
< ここに質問文が入る(例:名古屋の大須について教えてください) >
# 回答
< ここに回答文が入る(例:名古屋の大須ですね!大須は名古屋の中心部にあり・・・略) >

この構造で出力されるので、質問に対する回答ができるんですね。
リーズニングモデルなどは回答の前にリーズニングセクションが入ります。(他の方法は知らん)


推論2
response = generate(
    model, 
    tokenizer, 
    prompt='# 指示\n石破茂氏って何をしている人ですか\n# 回答\n', 
    verbose=True
    )

# ==========
# 石破茂氏は日本の政治家で、現在は自由民主党の衆議院議員として活動しています。彼は主に、安全保障政策や経済政策、地方創生などの分野で活躍しています。また、彼は過去に防衛大臣や農林水産大臣を務めた経験があります。
# ==========
# Prompt: 17 tokens, 168.393 tokens-per-sec
# Generation: 53 tokens, 38.212 tokens-per-sec
# Peak memory: 2.242 GB

はい。いい感じですね。
学習データを2000個使ってます。
あ、情報が古いのは学習データが古いからなんです。



7. おわりに

これで、mlx_lmを使ってLLMを学習することができるようになりました。
いきなり13Bのモデルだとメモリーのスワップがすごかったので、今回は3.7Bのモデルに変更しました。きっと性能はそこそこなんだと思いますが、llm-jp-3-3.7bは、ほんとに素晴らしいベースモデルなので、綺麗に出力することができました。

十分な量の学習をしたわけではありませんが、今回は学習する方法を明確にすることが目標ですので、今回はこれで良しとします。

学習させていてやはりメモリーが一杯一杯になります。
unslothGradient Accumulationの機能が欲しいなぁ。

mlx_lmでは、まだやりたいことがあります。それは途中のcheckpoint(一時保存)からの学習の継続や、モデルのマージなどです。
またやり方がわかったらQiitaに投稿したいと思います。

ほんじゃまた


0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?