Qwen2-VLのFine-tuningをSlurm GPU環境で実行する

Posted at 2025-05-14

概要

Qwen2-VLのFine-tuningを以下の環境で実行する手順をまとめます。

本記事のポイントは以下です。

Slurmジョブスケジューラー環境で実施(OCIのHPC Clusterで構築した環境)
enrootでコンテナを利用
学習スクリプトは2U1/Qwen2-VL-Finetuneを利用
データセットはHuggingFaceM4/ChartQAを利用
上記にてマルチGPU(A10 x2GPU)で実行

手順

2U1/Qwen2-VL-Finetuneをcloneします

git clone https://github.com/2U1/Qwen2-VL-Finetune.git
cd Qwen2-VL-Finetune

データセット作成用のPython仮想環境の作成

python3.11 -m venv myenv
source myenv/bin/activate

データセット作成スクリプトの作成

vi convert_chartqa_to_qwen-vl.py

convert_chartqa_to_qwen-vl.pyの内容

#!/usr/bin/env python3
# convert_chartqa_to_qwen_vl.py

import os
import json
from datasets import load_dataset

def convert_split(split_name: str, output_dir: str):
    """
    Convert one split of ChartQA into Qwen-VL-finetune format.
    Writes output to output_dir/chartqa_{split_name}.json.
    """
    ds = load_dataset("HuggingFaceM4/ChartQA", split=split_name)  # :contentReference[oaicite:0]{index=0}
    os.makedirs(output_dir, exist_ok=True)
    out_path = os.path.join(output_dir, f"chartqa_{split_name}.json")

    converted = []
    for idx, example in enumerate(ds):
        # Get local path to the cached image file
        img_feat = example["image"]
        img_path = getattr(img_feat, "path", None)
        # Fallback: save PIL.Image to file if no path attribute
        if img_path is None:
            img_path = os.path.join(output_dir, f"{split_name}_{idx}.png")
            example["image"].save(img_path)

        question = example["query"]           # :contentReference[oaicite:1]{index=1}
        answer   = example["label"][0]        # :contentReference[oaicite:2]{index=2}

        converted.append({
            "id": f"{split_name}_{idx}",
            "conversations": [
                {
                    "from": "user",
                    "value": f"Picture 1: <img>{img_path}</img>\n{question}"
                },
                {
                    "from": "assistant",
                    "value": answer
                }
            ]
        })

    # Write out JSON list
    with open(out_path, "w", encoding="utf-8") as f:
        json.dump(converted, f, ensure_ascii=False, indent=2)
    print(f"Wrote {len(converted)} examples to {out_path}")

if __name__ == "__main__":
    # You can adjust splits or add your own here
    for split in ("train", "val", "test"):
        convert_split(split, output_dir="datasets")

データセット作成スクリプトを実行します

python convert_chartqa_to_qwen-vl.py

(スクリプトの実行結果)

(myenv) [opc@demo-controller Qwen2-VL-Finetune]$ python convert_chartqa_to_qwen-vl.py
Wrote 28299 examples to datasets/chartqa_train.json
Wrote 1920 examples to datasets/chartqa_val.json
Wrote 2500 examples to datasets/chartqa_test.json
(myenv) [opc@demo-controller Qwen2-VL-Finetune]$

ラッパースクリプトのパラメーターを実行環境に合わせて修正します

vi scripts/finetune_lora.sh

以下のように変更

MODEL_NAMEを任意のものに変更
NUM_DEVICESをGPU数に合わせて変更
--data_pathを準備したファイルに変更
--image_folderを準備したディレクトリを参照できるように変更

#!/bin/bash

# You can use 2B instead of 7B
# MODEL_NAME="Qwen/Qwen2-VL-7B-Instruct"
# MODEL_NAME="Qwen/Qwen2-VL-2B-Instruct"
# MODEL_NAME="Qwen/Qwen2.5-VL-3B-Instruct"
MODEL_NAME="Qwen/Qwen2.5-VL-7B-Instruct"

export PYTHONPATH=src:$PYTHONPATH

GLOBAL_BATCH_SIZE=128
BATCH_PER_DEVICE=4
NUM_DEVICES=2
GRAD_ACCUM_STEPS=$((GLOBAL_BATCH_SIZE / (BATCH_PER_DEVICE * NUM_DEVICES)))

# If you want to tune the `embed_token` with LoRA, You need to tune `lm_head` together

deepspeed src/train/train_sft.py \
    --use_liger True \
    --lora_enable True \
    --use_dora False \
    --lora_namespan_exclude "['lm_head', 'embed_tokens']" \
    --lora_rank 64 \
    --lora_alpha 64 \
    --lora_dropout 0.05 \
    --num_lora_modules -1 \
    --deepspeed scripts/zero3_offload.json \
    --model_id $MODEL_NAME \
    --data_path ./datasets/chartqa_train.json \
    --image_folder . \
    --remove_unused_columns False \
    --freeze_vision_tower False \
    --freeze_llm True \
    --freeze_merger False \
    --bf16 True \
    --fp16 False \
    --disable_flash_attn2 False \
    --output_dir output/testing_lora \
    --num_train_epochs 1 \
    --per_device_train_batch_size $BATCH_PER_DEVICE \
    --gradient_accumulation_steps $GRAD_ACCUM_STEPS \
    --image_min_pixels $((256 * 28 * 28)) \
    --image_max_pixels $((1280 * 28 * 28)) \
    --learning_rate 1e-4 \
    --merger_lr 1e-5 \
    --vision_lr 2e-6 \
    --weight_decay 0.1 \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --gradient_checkpointing True \
    --report_to tensorboard \
    --lazy_preprocess True \
    --save_strategy "steps" \
    --save_steps 200 \
    --save_total_limit 10 \
    --dataloader_num_workers 4

Slurmジョブスクリプトの作成

vi run.sh

(run.shの内容)

gresのGPU設定は適宜修正する

#!/bin/bash
#SBATCH --job-name=fine-tuning
#SBATCH --output=%x.%j.out
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --gres=gpu:A10:2

srun \
  --container-image=pytorch/pytorch:2.7.0-cuda12.8-cudnn9-devel \
  --container-mounts="./:/workspace" \
  --gres=gpu:A10:2 \
  --cpus-per-task=8 \
  bash -lc "
    cd /workspace

    echo '### Install pip librarys'
    pip install -r requirements.txt
    pip install peft==0.10.0 transformers==4.51.3 accelerate==0.28.0 datasets auto-gptq optimum
    pip install deepspeed qwen_vl_utils trl ujson liger_kernel tensorboardX

    pip install https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.0.9/flash_attn-2.6.3+cu128torch2.7-cp311-cp311-linux_x86_64.whl

    echo '### check versions'
    pip list

    echo '### run script'
    bash scripts/finetune_lora.sh
  "

ジョブスクリプトの投入

sbatch run.sh

実行結果

ジョブの実行結果

pyxis: importing docker image: pytorch/pytorch:2.7.0-cuda12.8-cudnn9-devel
pyxis: imported docker image: pytorch/pytorch:2.7.0-cuda12.8-cudnn9-devel
### Install pip librarys
Collecting accelerate==1.6.0 (from -r requirements.txt (line 1))
  Downloading accelerate-1.6.0-py3-none-any.whl.metadata (19 kB)
Collecting aiohappyeyeballs==2.6.1 (from -r requirements.txt (line 2))
  Downloading aiohappyeyeballs-2.6.1-py3-none-any.whl.metadata (5.9 kB)
Collecting aiohttp==3.11.18 (from -r requirements.txt (line 3))
  Downloading aiohttp-3.11.18-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Collecting aiosignal==1.3.2 (from -r requirements.txt (line 4))
  Downloading aiosignal-1.3.2-py2.py3-none-any.whl.metadata (3.8 kB)
Collecting annotated-types==0.7.0 (from -r requirements.txt (line 5))
  Downloading annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)
Requirement already satisfied: asttokens==3.0.0 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 6)) (3.0.0)
Requirement already satisfied: attrs==25.3.0 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 7)) (25.3.0)
Collecting av==14.3.0 (from -r requirements.txt (line 8))
  Downloading av-14.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.7 kB)
Collecting bitsandbytes==0.45.5 (from -r requirements.txt (line 9))
  Downloading bitsandbytes-0.45.5-py3-none-manylinux_2_24_x86_64.whl.metadata (5.0 kB)
Collecting certifi==2025.4.26 (from -r requirements.txt (line 10))
  Downloading certifi-2025.4.26-py3-none-any.whl.metadata (2.5 kB)
Requirement already satisfied: charset-normalizer==3.4.1 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 11)) (3.4.1)
Requirement already satisfied: click==8.1.8 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 12)) (8.1.8)
Collecting comm==0.2.2 (from -r requirements.txt (line 13))
  Downloading comm-0.2.2-py3-none-any.whl.metadata (3.7 kB)
Collecting datasets==3.5.1 (from -r requirements.txt (line 14))
  Downloading datasets-3.5.1-py3-none-any.whl.metadata (19 kB)
Collecting debugpy==1.8.14 (from -r requirements.txt (line 15))
  Downloading debugpy-1.8.14-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.3 kB)
Requirement already satisfied: decorator==5.2.1 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 16)) (5.2.1)
Collecting decord==0.6.0 (from -r requirements.txt (line 17))
  Downloading decord-0.6.0-py3-none-manylinux2010_x86_64.whl.metadata (422 bytes)
Collecting deepspeed==0.16.7 (from -r requirements.txt (line 18))
  Downloading deepspeed-0.16.7.tar.gz (1.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 99.4 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting dill==0.3.8 (from -r requirements.txt (line 19))
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting docker-pycreds==0.4.0 (from -r requirements.txt (line 20))
  Downloading docker_pycreds-0.4.0-py2.py3-none-any.whl.metadata (1.8 kB)
Collecting einops==0.8.1 (from -r requirements.txt (line 21))
  Downloading einops-0.8.1-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: exceptiongroup==1.2.2 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 22)) (1.2.2)
Collecting executing==2.2.0 (from -r requirements.txt (line 23))
  Downloading executing-2.2.0-py2.py3-none-any.whl.metadata (8.9 kB)
Collecting filelock==3.13.1 (from -r requirements.txt (line 24))
  Downloading filelock-3.13.1-py3-none-any.whl.metadata (2.8 kB)
Collecting frozenlist==1.6.0 (from -r requirements.txt (line 25))
  Downloading frozenlist-1.6.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (16 kB)
Collecting fsspec==2024.6.1 (from -r requirements.txt (line 26))
  Downloading fsspec-2024.6.1-py3-none-any.whl.metadata (11 kB)
Collecting gitdb==4.0.12 (from -r requirements.txt (line 27))
  Downloading gitdb-4.0.12-py3-none-any.whl.metadata (1.2 kB)
Collecting GitPython==3.1.44 (from -r requirements.txt (line 28))
  Downloading GitPython-3.1.44-py3-none-any.whl.metadata (13 kB)
Collecting hjson==3.1.0 (from -r requirements.txt (line 29))
  Downloading hjson-3.1.0-py3-none-any.whl.metadata (2.6 kB)
Collecting huggingface-hub==0.30.2 (from -r requirements.txt (line 30))
  Downloading huggingface_hub-0.30.2-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: idna==3.10 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 31)) (3.10)
Collecting importlib_metadata==8.6.1 (from -r requirements.txt (line 32))
  Downloading importlib_metadata-8.6.1-py3-none-any.whl.metadata (4.7 kB)
Collecting ipykernel==6.29.5 (from -r requirements.txt (line 33))
  Downloading ipykernel-6.29.5-py3-none-any.whl.metadata (6.3 kB)
Collecting ipython==9.2.0 (from -r requirements.txt (line 34))
  Downloading ipython-9.2.0-py3-none-any.whl.metadata (4.4 kB)
Requirement already satisfied: ipython_pygments_lexers==1.1.1 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 35)) (1.1.1)
Collecting ipywidgets==8.1.6 (from -r requirements.txt (line 36))
  Downloading ipywidgets-8.1.6-py3-none-any.whl.metadata (2.4 kB)
Requirement already satisfied: jedi==0.19.2 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 37)) (0.19.2)
Collecting Jinja2==3.1.4 (from -r requirements.txt (line 38))
  Downloading jinja2-3.1.4-py3-none-any.whl.metadata (2.6 kB)
Collecting jupyter_client==8.6.3 (from -r requirements.txt (line 39))
  Downloading jupyter_client-8.6.3-py3-none-any.whl.metadata (8.3 kB)
Collecting jupyter_core==5.7.2 (from -r requirements.txt (line 40))
  Downloading jupyter_core-5.7.2-py3-none-any.whl.metadata (3.4 kB)
Collecting jupyterlab_widgets==3.0.14 (from -r requirements.txt (line 41))
  Downloading jupyterlab_widgets-3.0.14-py3-none-any.whl.metadata (4.1 kB)
Collecting liger_kernel==0.5.8 (from -r requirements.txt (line 42))
  Downloading liger_kernel-0.5.8-py3-none-any.whl.metadata (23 kB)
Collecting markdown-it-py==3.0.0 (from -r requirements.txt (line 43))
  Downloading markdown_it_py-3.0.0-py3-none-any.whl.metadata (6.9 kB)
Collecting MarkupSafe==2.1.5 (from -r requirements.txt (line 44))
  Downloading MarkupSafe-2.1.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.0 kB)
Requirement already satisfied: matplotlib-inline==0.1.7 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 45)) (0.1.7)
Collecting mdurl==0.1.2 (from -r requirements.txt (line 46))
  Downloading mdurl-0.1.2-py3-none-any.whl.metadata (1.6 kB)
Requirement already satisfied: mpmath==1.3.0 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 47)) (1.3.0)
Requirement already satisfied: msgpack==1.1.0 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 48)) (1.1.0)
Collecting multidict==6.4.3 (from -r requirements.txt (line 49))
  Downloading multidict-6.4.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.3 kB)
Collecting multiprocess==0.70.16 (from -r requirements.txt (line 50))
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting nest_asyncio==1.6.0 (from -r requirements.txt (line 51))
  Downloading nest_asyncio-1.6.0-py3-none-any.whl.metadata (2.8 kB)
Collecting networkx==3.3 (from -r requirements.txt (line 52))
  Downloading networkx-3.3-py3-none-any.whl.metadata (5.1 kB)
Requirement already satisfied: ninja==1.11.1.4 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 53)) (1.11.1.4)
Collecting numpy==2.1.2 (from -r requirements.txt (line 54))
  Downloading numpy-2.1.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
Collecting opencv-python==4.11.0.86 (from -r requirements.txt (line 55))
  Downloading opencv_python-4.11.0.86-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (20 kB)
Collecting packaging==25.0 (from -r requirements.txt (line 56))
  Downloading packaging-25.0-py3-none-any.whl.metadata (3.3 kB)
Collecting pandas==2.2.3 (from -r requirements.txt (line 57))
  Downloading pandas-2.2.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (89 kB)
Requirement already satisfied: parso==0.8.4 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 58)) (0.8.4)
Collecting peft==0.15.2 (from -r requirements.txt (line 59))
  Downloading peft-0.15.2-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: pexpect==4.9.0 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 60)) (4.9.0)
Requirement already satisfied: pickleshare==0.7.5 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 61)) (0.7.5)
Requirement already satisfied: pillow==11.0.0 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 62)) (11.0.0)
Collecting pip==25.1 (from -r requirements.txt (line 63))
  Downloading pip-25.1-py3-none-any.whl.metadata (3.6 kB)
Collecting platformdirs==4.3.7 (from -r requirements.txt (line 64))
  Downloading platformdirs-4.3.7-py3-none-any.whl.metadata (11 kB)
Requirement already satisfied: prompt_toolkit==3.0.51 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 65)) (3.0.51)
Collecting propcache==0.3.1 (from -r requirements.txt (line 66))
  Downloading propcache-0.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (10 kB)
Collecting protobuf==6.30.2 (from -r requirements.txt (line 67))
  Downloading protobuf-6.30.2-cp39-abi3-manylinux2014_x86_64.whl.metadata (593 bytes)
Requirement already satisfied: psutil==7.0.0 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 68)) (7.0.0)
Requirement already satisfied: ptyprocess==0.7.0 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 69)) (0.7.0)
Requirement already satisfied: pure_eval==0.2.3 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 70)) (0.2.3)
Collecting py-cpuinfo==9.0.0 (from -r requirements.txt (line 71))
  Downloading py_cpuinfo-9.0.0-py3-none-any.whl.metadata (794 bytes)
Collecting pyarrow==20.0.0 (from -r requirements.txt (line 72))
  Downloading pyarrow-20.0.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting pydantic==2.11.3 (from -r requirements.txt (line 73))
  Downloading pydantic-2.11.3-py3-none-any.whl.metadata (65 kB)
Collecting pydantic_core==2.33.1 (from -r requirements.txt (line 74))
  Downloading pydantic_core-2.33.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.8 kB)
Requirement already satisfied: Pygments==2.19.1 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 75)) (2.19.1)
Collecting python-dateutil==2.9.0.post0 (from -r requirements.txt (line 76))
  Downloading python_dateutil-2.9.0.post0-py2.py3-none-any.whl.metadata (8.4 kB)
Requirement already satisfied: pytz==2025.2 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 77)) (2025.2)
Requirement already satisfied: PyYAML==6.0.2 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 78)) (6.0.2)
Collecting pyzmq==26.4.0 (from -r requirements.txt (line 79))
  Downloading pyzmq-26.4.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (6.0 kB)
Collecting regex==2024.11.6 (from -r requirements.txt (line 80))
  Downloading regex-2024.11.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (40 kB)
Requirement already satisfied: requests==2.32.3 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 81)) (2.32.3)
Collecting rich==14.0.0 (from -r requirements.txt (line 82))
  Downloading rich-14.0.0-py3-none-any.whl.metadata (18 kB)
Collecting safetensors==0.5.3 (from -r requirements.txt (line 83))
  Downloading safetensors-0.5.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.8 kB)
Collecting sentry-sdk==2.27.0 (from -r requirements.txt (line 84))
  Downloading sentry_sdk-2.27.0-py2.py3-none-any.whl.metadata (10 kB)
Collecting setproctitle==1.3.5 (from -r requirements.txt (line 85))
  Downloading setproctitle-1.3.5-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (10 kB)
Collecting setuptools==79.0.1 (from -r requirements.txt (line 86))
  Downloading setuptools-79.0.1-py3-none-any.whl.metadata (6.5 kB)
Requirement already satisfied: six==1.17.0 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 87)) (1.17.0)
Collecting smmap==5.0.2 (from -r requirements.txt (line 88))
  Downloading smmap-5.0.2-py3-none-any.whl.metadata (4.3 kB)
Requirement already satisfied: stack_data==0.6.3 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 89)) (0.6.3)
Collecting sympy==1.13.1 (from -r requirements.txt (line 90))
  Downloading sympy-1.13.1-py3-none-any.whl.metadata (12 kB)
Collecting tensorboardX==2.6.2.2 (from -r requirements.txt (line 91))
  Downloading tensorboardX-2.6.2.2-py2.py3-none-any.whl.metadata (5.8 kB)
Collecting tokenizers==0.21.1 (from -r requirements.txt (line 92))
  Downloading tokenizers-0.21.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.8 kB)
ERROR: Could not find a version that satisfies the requirement torch==2.6.0+cu124 (from versions: 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.4.0, 2.4.1, 2.5.0, 2.5.1, 2.6.0, 2.7.0)
ERROR: No matching distribution found for torch==2.6.0+cu124
Collecting peft==0.10.0
  Downloading peft-0.10.0-py3-none-any.whl.metadata (13 kB)
Collecting transformers==4.51.3
  Downloading transformers-4.51.3-py3-none-any.whl.metadata (38 kB)
Collecting accelerate==0.28.0
  Downloading accelerate-0.28.0-py3-none-any.whl.metadata (18 kB)
Collecting datasets
  Downloading datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Collecting auto-gptq
  Downloading auto_gptq-0.7.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB)
Collecting optimum
  Downloading optimum-1.24.0-py3-none-any.whl.metadata (21 kB)
Requirement already satisfied: numpy>=1.17 in /opt/conda/lib/python3.11/site-packages (from peft==0.10.0) (2.2.5)
Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.11/site-packages (from peft==0.10.0) (24.2)
Requirement already satisfied: psutil in /opt/conda/lib/python3.11/site-packages (from peft==0.10.0) (7.0.0)
Requirement already satisfied: pyyaml in /opt/conda/lib/python3.11/site-packages (from peft==0.10.0) (6.0.2)
Requirement already satisfied: torch>=1.13.0 in /opt/conda/lib/python3.11/site-packages (from peft==0.10.0) (2.7.0+cu128)
Requirement already satisfied: tqdm in /opt/conda/lib/python3.11/site-packages (from peft==0.10.0) (4.67.1)
Collecting safetensors (from peft==0.10.0)
  Using cached safetensors-0.5.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.8 kB)
Collecting huggingface-hub>=0.17.0 (from peft==0.10.0)
  Downloading huggingface_hub-0.31.1-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: filelock in /opt/conda/lib/python3.11/site-packages (from transformers==4.51.3) (3.18.0)
Collecting regex!=2019.12.17 (from transformers==4.51.3)
  Using cached regex-2024.11.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (40 kB)
Requirement already satisfied: requests in /opt/conda/lib/python3.11/site-packages (from transformers==4.51.3) (2.32.3)
Collecting tokenizers<0.22,>=0.21 (from transformers==4.51.3)
  Using cached tokenizers-0.21.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.8 kB)
Collecting pyarrow>=15.0.0 (from datasets)
  Using cached pyarrow-20.0.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Using cached dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting pandas (from datasets)
  Using cached pandas-2.2.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (89 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Using cached multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2025.3.0,>=2023.1.0 (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Downloading fsspec-2025.3.0-py3-none-any.whl.metadata (11 kB)
Collecting sentencepiece (from auto-gptq)
  Downloading sentencepiece-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Collecting rouge (from auto-gptq)
  Downloading rouge-1.0.1-py3-none-any.whl.metadata (4.1 kB)
Collecting gekko (from auto-gptq)
  Downloading gekko-1.3.0-py3-none-any.whl.metadata (3.0 kB)
Collecting aiohttp!=4.0.0a0,!=4.0.0a1 (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Using cached aiohttp-3.11.18-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/lib/python3.11/site-packages (from huggingface-hub>=0.17.0->peft==0.10.0) (4.13.2)
Collecting hf-xet<2.0.0,>=1.1.0 (from huggingface-hub>=0.17.0->peft==0.10.0)
  Downloading hf_xet-1.1.1-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (494 bytes)
Requirement already satisfied: charset_normalizer<4,>=2 in /opt/conda/lib/python3.11/site-packages (from requests->transformers==4.51.3) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.11/site-packages (from requests->transformers==4.51.3) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.11/site-packages (from requests->transformers==4.51.3) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.11/site-packages (from requests->transformers==4.51.3) (2025.1.31)
Requirement already satisfied: sympy>=1.13.3 in /opt/conda/lib/python3.11/site-packages (from torch>=1.13.0->peft==0.10.0) (1.13.3)
Requirement already satisfied: networkx in /opt/conda/lib/python3.11/site-packages (from torch>=1.13.0->peft==0.10.0) (3.4.2)
Requirement already satisfied: jinja2 in /opt/conda/lib/python3.11/site-packages (from torch>=1.13.0->peft==0.10.0) (3.1.6)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.8.61 in /opt/conda/lib/python3.11/site-packages (from torch>=1.13.0->peft==0.10.0) (12.8.61)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.8.57 in /opt/conda/lib/python3.11/site-packages (from torch>=1.13.0->peft==0.10.0) (12.8.57)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.8.57 in /opt/conda/lib/python3.11/site-packages (from torch>=1.13.0->peft==0.10.0) (12.8.57)
Requirement already satisfied: nvidia-cudnn-cu12==9.7.1.26 in /opt/conda/lib/python3.11/site-packages (from torch>=1.13.0->peft==0.10.0) (9.7.1.26)
Requirement already satisfied: nvidia-cublas-cu12==12.8.3.14 in /opt/conda/lib/python3.11/site-packages (from torch>=1.13.0->peft==0.10.0) (12.8.3.14)
Requirement already satisfied: nvidia-cufft-cu12==11.3.3.41 in /opt/conda/lib/python3.11/site-packages (from torch>=1.13.0->peft==0.10.0) (11.3.3.41)
Requirement already satisfied: nvidia-curand-cu12==10.3.9.55 in /opt/conda/lib/python3.11/site-packages (from torch>=1.13.0->peft==0.10.0) (10.3.9.55)
Requirement already satisfied: nvidia-cusolver-cu12==11.7.2.55 in /opt/conda/lib/python3.11/site-packages (from torch>=1.13.0->peft==0.10.0) (11.7.2.55)
Requirement already satisfied: nvidia-cusparse-cu12==12.5.7.53 in /opt/conda/lib/python3.11/site-packages (from torch>=1.13.0->peft==0.10.0) (12.5.7.53)
Requirement already satisfied: nvidia-cusparselt-cu12==0.6.3 in /opt/conda/lib/python3.11/site-packages (from torch>=1.13.0->peft==0.10.0) (0.6.3)
Requirement already satisfied: nvidia-nccl-cu12==2.26.2 in /opt/conda/lib/python3.11/site-packages (from torch>=1.13.0->peft==0.10.0) (2.26.2)
Requirement already satisfied: nvidia-nvtx-cu12==12.8.55 in /opt/conda/lib/python3.11/site-packages (from torch>=1.13.0->peft==0.10.0) (12.8.55)
Requirement already satisfied: nvidia-nvjitlink-cu12==12.8.61 in /opt/conda/lib/python3.11/site-packages (from torch>=1.13.0->peft==0.10.0) (12.8.61)
Requirement already satisfied: nvidia-cufile-cu12==1.13.0.11 in /opt/conda/lib/python3.11/site-packages (from torch>=1.13.0->peft==0.10.0) (1.13.0.11)
Requirement already satisfied: triton==3.3.0 in /opt/conda/lib/python3.11/site-packages (from torch>=1.13.0->peft==0.10.0) (3.3.0)
Requirement already satisfied: setuptools>=40.8.0 in /opt/conda/lib/python3.11/site-packages (from triton==3.3.0->torch>=1.13.0->peft==0.10.0) (75.8.2)
Collecting python-dateutil>=2.8.2 (from pandas->datasets)
  Using cached python_dateutil-2.9.0.post0-py2.py3-none-any.whl.metadata (8.4 kB)
Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.11/site-packages (from pandas->datasets) (2025.2)
Collecting tzdata>=2022.7 (from pandas->datasets)
  Downloading tzdata-2025.2-py2.py3-none-any.whl.metadata (1.4 kB)
Requirement already satisfied: six in /opt/conda/lib/python3.11/site-packages (from rouge->auto-gptq) (1.17.0)
Collecting aiohappyeyeballs>=2.3.0 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Using cached aiohappyeyeballs-2.6.1-py3-none-any.whl.metadata (5.9 kB)
Collecting aiosignal>=1.1.2 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Using cached aiosignal-1.3.2-py2.py3-none-any.whl.metadata (3.8 kB)
Requirement already satisfied: attrs>=17.3.0 in /opt/conda/lib/python3.11/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (25.3.0)
Collecting frozenlist>=1.1.1 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Using cached frozenlist-1.6.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (16 kB)
Collecting multidict<7.0,>=4.5 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Using cached multidict-6.4.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.3 kB)
Collecting propcache>=0.2.0 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Using cached propcache-0.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (10 kB)
Collecting yarl<2.0,>=1.17.0 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Downloading yarl-1.20.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (72 kB)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/lib/python3.11/site-packages (from sympy>=1.13.3->torch>=1.13.0->peft==0.10.0) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.11/site-packages (from jinja2->torch>=1.13.0->peft==0.10.0) (3.0.2)
Downloading peft-0.10.0-py3-none-any.whl (199 kB)
Downloading transformers-4.51.3-py3-none-any.whl (10.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.4/10.4 MB 23.1 MB/s eta 0:00:00
Downloading accelerate-0.28.0-py3-none-any.whl (290 kB)
Downloading datasets-3.6.0-py3-none-any.whl (491 kB)
Downloading auto_gptq-0.7.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.5/23.5 MB 23.2 MB/s eta 0:00:00
Downloading optimum-1.24.0-py3-none-any.whl (433 kB)
Downloading dill-0.3.8-py3-none-any.whl (116 kB)
Downloading fsspec-2025.3.0-py3-none-any.whl (193 kB)
Downloading huggingface_hub-0.31.1-py3-none-any.whl (484 kB)
Downloading multiprocess-0.70.16-py311-none-any.whl (143 kB)
Downloading pyarrow-20.0.0-cp311-cp311-manylinux_2_28_x86_64.whl (42.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.3/42.3 MB 31.2 MB/s eta 0:00:00
Downloading regex-2024.11.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (792 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 792.7/792.7 kB 3.8 MB/s eta 0:00:00
Downloading safetensors-0.5.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (471 kB)
Downloading tokenizers-0.21.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.0/3.0 MB 7.1 MB/s eta 0:00:00
Downloading gekko-1.3.0-py3-none-any.whl (13.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.2/13.2 MB 8.4 MB/s eta 0:00:00
Downloading pandas-2.2.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.1/13.1 MB 42.6 MB/s eta 0:00:00
Downloading rouge-1.0.1-py3-none-any.whl (13 kB)
Downloading sentencepiece-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 2.4 MB/s eta 0:00:00
Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
Downloading aiohttp-3.11.18-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 464.5 kB/s eta 0:00:00
Downloading hf_xet-1.1.1-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 25.5/25.5 MB 46.8 MB/s eta 0:00:00
Downloading python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
Downloading tzdata-2025.2-py2.py3-none-any.whl (347 kB)
Downloading aiohappyeyeballs-2.6.1-py3-none-any.whl (15 kB)
Downloading aiosignal-1.3.2-py2.py3-none-any.whl (7.6 kB)
Downloading frozenlist-1.6.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (313 kB)
Downloading multidict-6.4.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (223 kB)
Downloading propcache-0.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (232 kB)
Downloading yarl-1.20.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (358 kB)
Installing collected packages: sentencepiece, xxhash, tzdata, safetensors, rouge, regex, python-dateutil, pyarrow, propcache, multidict, hf-xet, gekko, fsspec, frozenlist, dill, aiohappyeyeballs, yarl, pandas, multiprocess, huggingface-hub, aiosignal, tokenizers, aiohttp, transformers, accelerate, peft, optimum, datasets, auto-gptq
  Attempting uninstall: fsspec
    Found existing installation: fsspec 2025.3.2
    Uninstalling fsspec-2025.3.2:
      Successfully uninstalled fsspec-2025.3.2
Successfully installed accelerate-0.28.0 aiohappyeyeballs-2.6.1 aiohttp-3.11.18 aiosignal-1.3.2 auto-gptq-0.7.1 datasets-3.6.0 dill-0.3.8 frozenlist-1.6.0 fsspec-2025.3.0 gekko-1.3.0 hf-xet-1.1.1 huggingface-hub-0.31.1 multidict-6.4.3 multiprocess-0.70.16 optimum-1.24.0 pandas-2.2.3 peft-0.10.0 propcache-0.3.1 pyarrow-20.0.0 python-dateutil-2.9.0.post0 regex-2024.11.6 rouge-1.0.1 safetensors-0.5.3 sentencepiece-0.2.0 tokenizers-0.21.1 transformers-4.51.3 tzdata-2025.2 xxhash-3.5.0 yarl-1.20.0
Collecting deepspeed
  Using cached deepspeed-0.16.7.tar.gz (1.5 MB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting qwen_vl_utils
  Downloading qwen_vl_utils-0.0.11-py3-none-any.whl.metadata (6.3 kB)
Collecting trl
  Downloading trl-0.17.0-py3-none-any.whl.metadata (12 kB)
Collecting ujson
  Downloading ujson-5.10.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.3 kB)
Collecting liger_kernel
  Downloading liger_kernel-0.5.9-py3-none-any.whl.metadata (23 kB)
Collecting tensorboardX
  Using cached tensorboardX-2.6.2.2-py2.py3-none-any.whl.metadata (5.8 kB)
Collecting einops (from deepspeed)
  Using cached einops-0.8.1-py3-none-any.whl.metadata (13 kB)
Collecting hjson (from deepspeed)
  Using cached hjson-3.1.0-py3-none-any.whl.metadata (2.6 kB)
Requirement already satisfied: msgpack in /opt/conda/lib/python3.11/site-packages (from deepspeed) (1.1.0)
Requirement already satisfied: ninja in /opt/conda/lib/python3.11/site-packages (from deepspeed) (1.11.1.4)
Requirement already satisfied: numpy in /opt/conda/lib/python3.11/site-packages (from deepspeed) (2.2.5)
Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.11/site-packages (from deepspeed) (24.2)
Requirement already satisfied: psutil in /opt/conda/lib/python3.11/site-packages (from deepspeed) (7.0.0)
Collecting py-cpuinfo (from deepspeed)
  Using cached py_cpuinfo-9.0.0-py3-none-any.whl.metadata (794 bytes)
Collecting pydantic>=2.0.0 (from deepspeed)
  Downloading pydantic-2.11.4-py3-none-any.whl.metadata (66 kB)
Requirement already satisfied: torch in /opt/conda/lib/python3.11/site-packages (from deepspeed) (2.7.0+cu128)
Requirement already satisfied: tqdm in /opt/conda/lib/python3.11/site-packages (from deepspeed) (4.67.1)
Collecting nvidia-ml-py (from deepspeed)
  Downloading nvidia_ml_py-12.575.51-py3-none-any.whl.metadata (9.3 kB)
Collecting av (from qwen_vl_utils)
  Using cached av-14.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.7 kB)
Requirement already satisfied: pillow in /opt/conda/lib/python3.11/site-packages (from qwen_vl_utils) (11.0.0)
Requirement already satisfied: requests in /opt/conda/lib/python3.11/site-packages (from qwen_vl_utils) (2.32.3)
Collecting accelerate>=0.34.0 (from trl)
  Using cached accelerate-1.6.0-py3-none-any.whl.metadata (19 kB)
Requirement already satisfied: datasets>=3.0.0 in /opt/conda/lib/python3.11/site-packages (from trl) (3.6.0)
Collecting rich (from trl)
  Using cached rich-14.0.0-py3-none-any.whl.metadata (18 kB)
Requirement already satisfied: transformers>=4.46.0 in /opt/conda/lib/python3.11/site-packages (from trl) (4.51.3)
Requirement already satisfied: triton>=2.3.1 in /opt/conda/lib/python3.11/site-packages (from liger_kernel) (3.3.0)
Collecting protobuf>=3.20 (from tensorboardX)
  Using cached protobuf-6.30.2-cp39-abi3-manylinux2014_x86_64.whl.metadata (593 bytes)
Requirement already satisfied: pyyaml in /opt/conda/lib/python3.11/site-packages (from accelerate>=0.34.0->trl) (6.0.2)
Requirement already satisfied: huggingface-hub>=0.21.0 in /opt/conda/lib/python3.11/site-packages (from accelerate>=0.34.0->trl) (0.31.1)
Requirement already satisfied: safetensors>=0.4.3 in /opt/conda/lib/python3.11/site-packages (from accelerate>=0.34.0->trl) (0.5.3)
Requirement already satisfied: filelock in /opt/conda/lib/python3.11/site-packages (from datasets>=3.0.0->trl) (3.18.0)
Requirement already satisfied: pyarrow>=15.0.0 in /opt/conda/lib/python3.11/site-packages (from datasets>=3.0.0->trl) (20.0.0)
Requirement already satisfied: dill<0.3.9,>=0.3.0 in /opt/conda/lib/python3.11/site-packages (from datasets>=3.0.0->trl) (0.3.8)
Requirement already satisfied: pandas in /opt/conda/lib/python3.11/site-packages (from datasets>=3.0.0->trl) (2.2.3)
Requirement already satisfied: xxhash in /opt/conda/lib/python3.11/site-packages (from datasets>=3.0.0->trl) (3.5.0)
Requirement already satisfied: multiprocess<0.70.17 in /opt/conda/lib/python3.11/site-packages (from datasets>=3.0.0->trl) (0.70.16)
Requirement already satisfied: fsspec<=2025.3.0,>=2023.1.0 in /opt/conda/lib/python3.11/site-packages (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets>=3.0.0->trl) (2025.3.0)
Collecting annotated-types>=0.6.0 (from pydantic>=2.0.0->deepspeed)
  Using cached annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)
Collecting pydantic-core==2.33.2 (from pydantic>=2.0.0->deepspeed)
  Downloading pydantic_core-2.33.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.8 kB)
Requirement already satisfied: typing-extensions>=4.12.2 in /opt/conda/lib/python3.11/site-packages (from pydantic>=2.0.0->deepspeed) (4.13.2)
Collecting typing-inspection>=0.4.0 (from pydantic>=2.0.0->deepspeed)
  Downloading typing_inspection-0.4.0-py3-none-any.whl.metadata (2.6 kB)
Requirement already satisfied: charset_normalizer<4,>=2 in /opt/conda/lib/python3.11/site-packages (from requests->qwen_vl_utils) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.11/site-packages (from requests->qwen_vl_utils) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.11/site-packages (from requests->qwen_vl_utils) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.11/site-packages (from requests->qwen_vl_utils) (2025.1.31)
Requirement already satisfied: sympy>=1.13.3 in /opt/conda/lib/python3.11/site-packages (from torch->deepspeed) (1.13.3)
Requirement already satisfied: networkx in /opt/conda/lib/python3.11/site-packages (from torch->deepspeed) (3.4.2)
Requirement already satisfied: jinja2 in /opt/conda/lib/python3.11/site-packages (from torch->deepspeed) (3.1.6)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.8.61 in /opt/conda/lib/python3.11/site-packages (from torch->deepspeed) (12.8.61)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.8.57 in /opt/conda/lib/python3.11/site-packages (from torch->deepspeed) (12.8.57)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.8.57 in /opt/conda/lib/python3.11/site-packages (from torch->deepspeed) (12.8.57)
Requirement already satisfied: nvidia-cudnn-cu12==9.7.1.26 in /opt/conda/lib/python3.11/site-packages (from torch->deepspeed) (9.7.1.26)
Requirement already satisfied: nvidia-cublas-cu12==12.8.3.14 in /opt/conda/lib/python3.11/site-packages (from torch->deepspeed) (12.8.3.14)
Requirement already satisfied: nvidia-cufft-cu12==11.3.3.41 in /opt/conda/lib/python3.11/site-packages (from torch->deepspeed) (11.3.3.41)
Requirement already satisfied: nvidia-curand-cu12==10.3.9.55 in /opt/conda/lib/python3.11/site-packages (from torch->deepspeed) (10.3.9.55)
Requirement already satisfied: nvidia-cusolver-cu12==11.7.2.55 in /opt/conda/lib/python3.11/site-packages (from torch->deepspeed) (11.7.2.55)
Requirement already satisfied: nvidia-cusparse-cu12==12.5.7.53 in /opt/conda/lib/python3.11/site-packages (from torch->deepspeed) (12.5.7.53)
Requirement already satisfied: nvidia-cusparselt-cu12==0.6.3 in /opt/conda/lib/python3.11/site-packages (from torch->deepspeed) (0.6.3)
Requirement already satisfied: nvidia-nccl-cu12==2.26.2 in /opt/conda/lib/python3.11/site-packages (from torch->deepspeed) (2.26.2)
Requirement already satisfied: nvidia-nvtx-cu12==12.8.55 in /opt/conda/lib/python3.11/site-packages (from torch->deepspeed) (12.8.55)
Requirement already satisfied: nvidia-nvjitlink-cu12==12.8.61 in /opt/conda/lib/python3.11/site-packages (from torch->deepspeed) (12.8.61)
Requirement already satisfied: nvidia-cufile-cu12==1.13.0.11 in /opt/conda/lib/python3.11/site-packages (from torch->deepspeed) (1.13.0.11)
Requirement already satisfied: setuptools>=40.8.0 in /opt/conda/lib/python3.11/site-packages (from triton>=2.3.1->liger_kernel) (75.8.2)
Requirement already satisfied: regex!=2019.12.17 in /opt/conda/lib/python3.11/site-packages (from transformers>=4.46.0->trl) (2024.11.6)
Requirement already satisfied: tokenizers<0.22,>=0.21 in /opt/conda/lib/python3.11/site-packages (from transformers>=4.46.0->trl) (0.21.1)
Collecting markdown-it-py>=2.2.0 (from rich->trl)
  Using cached markdown_it_py-3.0.0-py3-none-any.whl.metadata (6.9 kB)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /opt/conda/lib/python3.11/site-packages (from rich->trl) (2.19.1)
Requirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /opt/conda/lib/python3.11/site-packages (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets>=3.0.0->trl) (3.11.18)
Requirement already satisfied: hf-xet<2.0.0,>=1.1.0 in /opt/conda/lib/python3.11/site-packages (from huggingface-hub>=0.21.0->accelerate>=0.34.0->trl) (1.1.1)
Collecting mdurl~=0.1 (from markdown-it-py>=2.2.0->rich->trl)
  Using cached mdurl-0.1.2-py3-none-any.whl.metadata (1.6 kB)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/lib/python3.11/site-packages (from sympy>=1.13.3->torch->deepspeed) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.11/site-packages (from jinja2->torch->deepspeed) (3.0.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /opt/conda/lib/python3.11/site-packages (from pandas->datasets>=3.0.0->trl) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.11/site-packages (from pandas->datasets>=3.0.0->trl) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /opt/conda/lib/python3.11/site-packages (from pandas->datasets>=3.0.0->trl) (2025.2)
Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /opt/conda/lib/python3.11/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets>=3.0.0->trl) (2.6.1)
Requirement already satisfied: aiosignal>=1.1.2 in /opt/conda/lib/python3.11/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets>=3.0.0->trl) (1.3.2)
Requirement already satisfied: attrs>=17.3.0 in /opt/conda/lib/python3.11/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets>=3.0.0->trl) (25.3.0)
Requirement already satisfied: frozenlist>=1.1.1 in /opt/conda/lib/python3.11/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets>=3.0.0->trl) (1.6.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /opt/conda/lib/python3.11/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets>=3.0.0->trl) (6.4.3)
Requirement already satisfied: propcache>=0.2.0 in /opt/conda/lib/python3.11/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets>=3.0.0->trl) (0.3.1)
Requirement already satisfied: yarl<2.0,>=1.17.0 in /opt/conda/lib/python3.11/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets>=3.0.0->trl) (1.20.0)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.11/site-packages (from python-dateutil>=2.8.2->pandas->datasets>=3.0.0->trl) (1.17.0)
Downloading qwen_vl_utils-0.0.11-py3-none-any.whl (7.6 kB)
Downloading trl-0.17.0-py3-none-any.whl (348 kB)
Downloading ujson-5.10.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (53 kB)
Downloading liger_kernel-0.5.9-py3-none-any.whl (155 kB)
Downloading tensorboardX-2.6.2.2-py2.py3-none-any.whl (101 kB)
Downloading accelerate-1.6.0-py3-none-any.whl (354 kB)
Downloading protobuf-6.30.2-cp39-abi3-manylinux2014_x86_64.whl (316 kB)
Downloading pydantic-2.11.4-py3-none-any.whl (443 kB)
Downloading pydantic_core-2.33.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 4.9 MB/s eta 0:00:00
Downloading av-14.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (35.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 35.2/35.2 MB 12.8 MB/s eta 0:00:00
Downloading einops-0.8.1-py3-none-any.whl (64 kB)
Downloading hjson-3.1.0-py3-none-any.whl (54 kB)
Downloading nvidia_ml_py-12.575.51-py3-none-any.whl (47 kB)
Downloading py_cpuinfo-9.0.0-py3-none-any.whl (22 kB)
Downloading rich-14.0.0-py3-none-any.whl (243 kB)
Downloading annotated_types-0.7.0-py3-none-any.whl (13 kB)
Downloading markdown_it_py-3.0.0-py3-none-any.whl (87 kB)
Downloading typing_inspection-0.4.0-py3-none-any.whl (14 kB)
Downloading mdurl-0.1.2-py3-none-any.whl (10.0 kB)
Building wheels for collected packages: deepspeed
  Building wheel for deepspeed (setup.py): started
  Building wheel for deepspeed (setup.py): finished with status 'done'
  Created wheel for deepspeed: filename=deepspeed-0.16.7-py3-none-any.whl size=1642909 sha256=2e0a644a48cf418612e3704bf19302e36f0d2f13bb1a1c172541b819c0460478
  Stored in directory: /home/opc/.cache/pip/wheels/42/e7/1a/2106f7197cc13e09c68f1b4f55f7e5117a985e726378968970
Successfully built deepspeed
Installing collected packages: py-cpuinfo, nvidia-ml-py, hjson, ujson, typing-inspection, pydantic-core, protobuf, mdurl, einops, av, annotated-types, tensorboardX, qwen_vl_utils, pydantic, markdown-it-py, rich, liger_kernel, deepspeed, accelerate, trl
  Attempting uninstall: accelerate
    Found existing installation: accelerate 0.28.0
    Uninstalling accelerate-0.28.0:
      Successfully uninstalled accelerate-0.28.0
Successfully installed accelerate-1.6.0 annotated-types-0.7.0 av-14.3.0 deepspeed-0.16.7 einops-0.8.1 hjson-3.1.0 liger_kernel-0.5.9 markdown-it-py-3.0.0 mdurl-0.1.2 nvidia-ml-py-12.575.51 protobuf-6.30.2 py-cpuinfo-9.0.0 pydantic-2.11.4 pydantic-core-2.33.2 qwen_vl_utils-0.0.11 rich-14.0.0 tensorboardX-2.6.2.2 trl-0.17.0 typing-inspection-0.4.0 ujson-5.10.0
Collecting flash-attn==2.6.3+cu128torch2.7
  Downloading https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.0.9/flash_attn-2.6.3+cu128torch2.7-cp311-cp311-linux_x86_64.whl (186.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 186.3/186.3 MB 162.9 MB/s eta 0:00:00
Requirement already satisfied: torch in /opt/conda/lib/python3.11/site-packages (from flash-attn==2.6.3+cu128torch2.7) (2.7.0+cu128)
Requirement already satisfied: einops in /opt/conda/lib/python3.11/site-packages (from flash-attn==2.6.3+cu128torch2.7) (0.8.1)
Requirement already satisfied: filelock in /opt/conda/lib/python3.11/site-packages (from torch->flash-attn==2.6.3+cu128torch2.7) (3.18.0)
Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/lib/python3.11/site-packages (from torch->flash-attn==2.6.3+cu128torch2.7) (4.13.2)
Requirement already satisfied: sympy>=1.13.3 in /opt/conda/lib/python3.11/site-packages (from torch->flash-attn==2.6.3+cu128torch2.7) (1.13.3)
Requirement already satisfied: networkx in /opt/conda/lib/python3.11/site-packages (from torch->flash-attn==2.6.3+cu128torch2.7) (3.4.2)
Requirement already satisfied: jinja2 in /opt/conda/lib/python3.11/site-packages (from torch->flash-attn==2.6.3+cu128torch2.7) (3.1.6)
Requirement already satisfied: fsspec in /opt/conda/lib/python3.11/site-packages (from torch->flash-attn==2.6.3+cu128torch2.7) (2025.3.0)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.8.61 in /opt/conda/lib/python3.11/site-packages (from torch->flash-attn==2.6.3+cu128torch2.7) (12.8.61)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.8.57 in /opt/conda/lib/python3.11/site-packages (from torch->flash-attn==2.6.3+cu128torch2.7) (12.8.57)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.8.57 in /opt/conda/lib/python3.11/site-packages (from torch->flash-attn==2.6.3+cu128torch2.7) (12.8.57)
Requirement already satisfied: nvidia-cudnn-cu12==9.7.1.26 in /opt/conda/lib/python3.11/site-packages (from torch->flash-attn==2.6.3+cu128torch2.7) (9.7.1.26)
Requirement already satisfied: nvidia-cublas-cu12==12.8.3.14 in /opt/conda/lib/python3.11/site-packages (from torch->flash-attn==2.6.3+cu128torch2.7) (12.8.3.14)
Requirement already satisfied: nvidia-cufft-cu12==11.3.3.41 in /opt/conda/lib/python3.11/site-packages (from torch->flash-attn==2.6.3+cu128torch2.7) (11.3.3.41)
Requirement already satisfied: nvidia-curand-cu12==10.3.9.55 in /opt/conda/lib/python3.11/site-packages (from torch->flash-attn==2.6.3+cu128torch2.7) (10.3.9.55)
Requirement already satisfied: nvidia-cusolver-cu12==11.7.2.55 in /opt/conda/lib/python3.11/site-packages (from torch->flash-attn==2.6.3+cu128torch2.7) (11.7.2.55)
Requirement already satisfied: nvidia-cusparse-cu12==12.5.7.53 in /opt/conda/lib/python3.11/site-packages (from torch->flash-attn==2.6.3+cu128torch2.7) (12.5.7.53)
Requirement already satisfied: nvidia-cusparselt-cu12==0.6.3 in /opt/conda/lib/python3.11/site-packages (from torch->flash-attn==2.6.3+cu128torch2.7) (0.6.3)
Requirement already satisfied: nvidia-nccl-cu12==2.26.2 in /opt/conda/lib/python3.11/site-packages (from torch->flash-attn==2.6.3+cu128torch2.7) (2.26.2)
Requirement already satisfied: nvidia-nvtx-cu12==12.8.55 in /opt/conda/lib/python3.11/site-packages (from torch->flash-attn==2.6.3+cu128torch2.7) (12.8.55)
Requirement already satisfied: nvidia-nvjitlink-cu12==12.8.61 in /opt/conda/lib/python3.11/site-packages (from torch->flash-attn==2.6.3+cu128torch2.7) (12.8.61)
Requirement already satisfied: nvidia-cufile-cu12==1.13.0.11 in /opt/conda/lib/python3.11/site-packages (from torch->flash-attn==2.6.3+cu128torch2.7) (1.13.0.11)
Requirement already satisfied: triton==3.3.0 in /opt/conda/lib/python3.11/site-packages (from torch->flash-attn==2.6.3+cu128torch2.7) (3.3.0)
Requirement already satisfied: setuptools>=40.8.0 in /opt/conda/lib/python3.11/site-packages (from triton==3.3.0->torch->flash-attn==2.6.3+cu128torch2.7) (75.8.2)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/lib/python3.11/site-packages (from sympy>=1.13.3->torch->flash-attn==2.6.3+cu128torch2.7) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.11/site-packages (from jinja2->torch->flash-attn==2.6.3+cu128torch2.7) (3.0.2)
Installing collected packages: flash-attn
Successfully installed flash-attn-2.6.3
### check versions
Package                   Version
------------------------- ------------
accelerate                1.6.0
aiohappyeyeballs          2.6.1
aiohttp                   3.11.18
aiosignal                 1.3.2
annotated-types           0.7.0
archspec                  0.2.5
asttokens                 3.0.0
astunparse                1.6.3
attrs                     25.3.0
auto_gptq                 0.7.1
av                        14.3.0
beautifulsoup4            4.13.4
boltons                   24.0.0
Brotli                    1.1.0
certifi                   2025.1.31
cffi                      1.17.1
chardet                   5.2.0
charset-normalizer        3.4.1
click                     8.1.8
cmake                     4.0.0
colorama                  0.4.6
conda                     25.3.1
conda-build               25.4.2
conda_index               0.6.0
conda-libmamba-solver     25.3.0
conda-package-handling    2.4.0
conda_package_streaming   0.11.0
datasets                  3.6.0
decorator                 5.2.1
deepspeed                 0.16.7
dill                      0.3.8
distro                    1.9.0
dnspython                 2.7.0
einops                    0.8.1
evalidate                 2.0.3
exceptiongroup            1.2.2
executing                 2.1.0
expecttest                0.3.0
filelock                  3.18.0
flash_attn                2.6.3
frozendict                2.4.6
frozenlist                1.6.0
fsspec                    2025.3.0
gekko                     1.3.0
h2                        4.2.0
hf-xet                    1.1.1
hjson                     3.1.0
hpack                     4.1.0
huggingface-hub           0.31.1
hyperframe                6.1.0
hypothesis                6.131.7
idna                      3.10
importlib_resources       6.5.2
ipython                   9.1.0
ipython_pygments_lexers   1.1.1
jedi                      0.19.2
Jinja2                    3.1.6
jsonpatch                 1.33
jsonpointer               3.0.0
jsonschema                4.23.0
jsonschema-specifications 2024.10.1
libarchive-c              5.2
libmambapy                2.1.0
lief                      0.16.4
liger_kernel              0.5.9
lintrunner                0.12.7
markdown-it-py            3.0.0
MarkupSafe                3.0.2
matplotlib-inline         0.1.7
mdurl                     0.1.2
menuinst                  2.2.0
more-itertools            10.7.0
mpmath                    1.3.0
msgpack                   1.1.0
multidict                 6.4.3
multiprocess              0.70.16
networkx                  3.4.2
ninja                     1.11.1.4
numpy                     2.2.5
nvidia-cublas-cu12        12.8.3.14
nvidia-cuda-cupti-cu12    12.8.57
nvidia-cuda-nvrtc-cu12    12.8.61
nvidia-cuda-runtime-cu12  12.8.57
nvidia-cudnn-cu12         9.7.1.26
nvidia-cufft-cu12         11.3.3.41
nvidia-cufile-cu12        1.13.0.11
nvidia-curand-cu12        10.3.9.55
nvidia-cusolver-cu12      11.7.2.55
nvidia-cusparse-cu12      12.5.7.53
nvidia-cusparselt-cu12    0.6.3
nvidia-ml-py              12.575.51
nvidia-nccl-cu12          2.26.2
nvidia-nvjitlink-cu12     12.8.61
nvidia-nvtx-cu12          12.8.55
optimum                   1.24.0
optree                    0.15.0
packaging                 24.2
pandas                    2.2.3
parso                     0.8.4
peft                      0.10.0
pexpect                   4.9.0
pickleshare               0.7.5
pillow                    11.0.0
pip                       25.0.1
pkginfo                   1.12.1.2
pkgutil_resolve_name      1.3.10
platformdirs              4.3.6
pluggy                    1.5.0
prompt_toolkit            3.0.51
propcache                 0.3.1
protobuf                  6.30.2
psutil                    7.0.0
ptyprocess                0.7.0
pure_eval                 0.2.3
py-cpuinfo                9.0.0
pyarrow                   20.0.0
pycosat                   0.6.6
pycparser                 2.22
pydantic                  2.11.4
pydantic_core             2.33.2
Pygments                  2.19.1
PySocks                   1.7.1
python-dateutil           2.9.0.post0
python-etcd               0.4.5
pytz                      2025.2
PyYAML                    6.0.2
qwen-vl-utils             0.0.11
referencing               0.36.2
regex                     2024.11.6
requests                  2.32.3
rich                      14.0.0
rouge                     1.0.1
rpds-py                   0.24.0
ruamel.yaml               0.18.10
ruamel.yaml.clib          0.2.8
safetensors               0.5.3
sentencepiece             0.2.0
setuptools                75.8.2
six                       1.17.0
sortedcontainers          2.4.0
soupsieve                 2.5
stack_data                0.6.3
sympy                     1.13.3
tensorboardX              2.6.2.2
tokenizers                0.21.1
torch                     2.7.0+cu128
torchaudio                2.7.0+cu128
torchelastic              0.2.2
torchvision               0.22.0+cu128
tqdm                      4.67.1
traitlets                 5.14.3
transformers              4.51.3
triton                    3.3.0
trl                       0.17.0
truststore                0.10.1
types-dataclasses         0.6.6
typing_extensions         4.13.2
typing-inspection         0.4.0
tzdata                    2025.2
ujson                     5.10.0
urllib3                   2.3.0
wcwidth                   0.2.13
wheel                     0.45.1
xxhash                    3.5.0
yarl                      1.20.0
zipp                      3.21.0
zstandard                 0.23.0
### run script
[2025-05-13 07:15:04,601] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
df: /home/opc/.triton/autotune: No such file or directory
[2025-05-13 07:15:07,344] [WARNING] [runner.py:215:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
Detected VISIBLE_DEVICES=0,1: setting --include=localhost:0,1
[2025-05-13 07:15:07,344] [INFO] [runner.py:605:main] cmd = /opt/conda/bin/python3.11 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMV19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None src/train/train_sft.py --use_liger True --lora_enable True --use_dora False --lora_namespan_exclude ['lm_head', 'embed_tokens'] --lora_rank 64 --lora_alpha 64 --lora_dropout 0.05 --num_lora_modules -1 --deepspeed scripts/zero3_offload.json --model_id Qwen/Qwen2.5-VL-7B-Instruct --data_path ./datasets/chartqa_train.json --image_folder . --remove_unused_columns False --freeze_vision_tower False --freeze_llm True --freeze_merger False --bf16 True --fp16 False --disable_flash_attn2 False --output_dir output/testing_lora --num_train_epochs 1 --per_device_train_batch_size 4 --gradient_accumulation_steps 16 --image_min_pixels 200704 --image_max_pixels 1003520 --learning_rate 1e-4 --merger_lr 1e-5 --vision_lr 2e-6 --weight_decay 0.1 --warmup_ratio 0.03 --lr_scheduler_type cosine --logging_steps 1 --tf32 True --gradient_checkpointing True --report_to tensorboard --lazy_preprocess True --save_strategy steps --save_steps 200 --save_total_limit 10 --dataloader_num_workers 4
[2025-05-13 07:15:08,654] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-05-13 07:15:11,208] [INFO] [launch.py:139:main] 0 NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.25.1-1+cuda12.8
[2025-05-13 07:15:11,208] [INFO] [launch.py:139:main] 0 NV_LIBNCCL_DEV_PACKAGE_VERSION=2.25.1-1
[2025-05-13 07:15:11,208] [INFO] [launch.py:139:main] 0 NCCL_VERSION=2.25.1-1
[2025-05-13 07:15:11,208] [INFO] [launch.py:139:main] 0 NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev
[2025-05-13 07:15:11,208] [INFO] [launch.py:139:main] 0 NV_LIBNCCL_PACKAGE=libnccl2=2.25.1-1+cuda12.8
[2025-05-13 07:15:11,208] [INFO] [launch.py:139:main] 0 NV_LIBNCCL_PACKAGE_NAME=libnccl2
[2025-05-13 07:15:11,208] [INFO] [launch.py:139:main] 0 NV_LIBNCCL_PACKAGE_VERSION=2.25.1-1
[2025-05-13 07:15:11,208] [INFO] [launch.py:146:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2025-05-13 07:15:11,208] [INFO] [launch.py:152:main] nnodes=1, num_local_procs=2, node_rank=0
[2025-05-13 07:15:11,208] [INFO] [launch.py:163:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2025-05-13 07:15:11,208] [INFO] [launch.py:164:main] dist_world_size=2
[2025-05-13 07:15:11,208] [INFO] [launch.py:168:main] Setting CUDA_VISIBLE_DEVICES=0,1
[2025-05-13 07:15:11,209] [INFO] [launch.py:256:main] process 30445 spawned with command: ['/opt/conda/bin/python3.11', '-u', 'src/train/train_sft.py', '--local_rank=0', '--use_liger', 'True', '--lora_enable', 'True', '--use_dora', 'False', '--lora_namespan_exclude', "['lm_head', 'embed_tokens']", '--lora_rank', '64', '--lora_alpha', '64', '--lora_dropout', '0.05', '--num_lora_modules', '-1', '--deepspeed', 'scripts/zero3_offload.json', '--model_id', 'Qwen/Qwen2.5-VL-7B-Instruct', '--data_path', './datasets/chartqa_train.json', '--image_folder', '.', '--remove_unused_columns', 'False', '--freeze_vision_tower', 'False', '--freeze_llm', 'True', '--freeze_merger', 'False', '--bf16', 'True', '--fp16', 'False', '--disable_flash_attn2', 'False', '--output_dir', 'output/testing_lora', '--num_train_epochs', '1', '--per_device_train_batch_size', '4', '--gradient_accumulation_steps', '16', '--image_min_pixels', '200704', '--image_max_pixels', '1003520', '--learning_rate', '1e-4', '--merger_lr', '1e-5', '--vision_lr', '2e-6', '--weight_decay', '0.1', '--warmup_ratio', '0.03', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--tf32', 'True', '--gradient_checkpointing', 'True', '--report_to', 'tensorboard', '--lazy_preprocess', 'True', '--save_strategy', 'steps', '--save_steps', '200', '--save_total_limit', '10', '--dataloader_num_workers', '4']
[2025-05-13 07:15:11,209] [INFO] [launch.py:256:main] process 30446 spawned with command: ['/opt/conda/bin/python3.11', '-u', 'src/train/train_sft.py', '--local_rank=1', '--use_liger', 'True', '--lora_enable', 'True', '--use_dora', 'False', '--lora_namespan_exclude', "['lm_head', 'embed_tokens']", '--lora_rank', '64', '--lora_alpha', '64', '--lora_dropout', '0.05', '--num_lora_modules', '-1', '--deepspeed', 'scripts/zero3_offload.json', '--model_id', 'Qwen/Qwen2.5-VL-7B-Instruct', '--data_path', './datasets/chartqa_train.json', '--image_folder', '.', '--remove_unused_columns', 'False', '--freeze_vision_tower', 'False', '--freeze_llm', 'True', '--freeze_merger', 'False', '--bf16', 'True', '--fp16', 'False', '--disable_flash_attn2', 'False', '--output_dir', 'output/testing_lora', '--num_train_epochs', '1', '--per_device_train_batch_size', '4', '--gradient_accumulation_steps', '16', '--image_min_pixels', '200704', '--image_max_pixels', '1003520', '--learning_rate', '1e-4', '--merger_lr', '1e-5', '--vision_lr', '2e-6', '--weight_decay', '0.1', '--warmup_ratio', '0.03', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--tf32', 'True', '--gradient_checkpointing', 'True', '--report_to', 'tensorboard', '--lazy_preprocess', 'True', '--save_strategy', 'steps', '--save_steps', '200', '--save_total_limit', '10', '--dataloader_num_workers', '4']
[2025-05-13 07:15:14,102] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-05-13 07:15:14,102] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-05-13 07:15:16,161] [INFO] [comm.py:669:init_distributed] cdb=None
[2025-05-13 07:15:16,231] [INFO] [comm.py:669:init_distributed] cdb=None
[2025-05-13 07:15:16,231] [INFO] [comm.py:700:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
Fetching 5 files: 100%|██████████| 5/5 [00:12<00:00,  2.41s/it]
[2025-05-13 07:15:30,265] [INFO] [config.py:735:__init__] Config mesh_device None world_size = 2
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Fetching 5 files: 100%|██████████| 5/5 [00:12<00:00,  2.40s/it]
[2025-05-13 07:15:30,303] [INFO] [config.py:735:__init__] Config mesh_device None world_size = 2
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
[2025-05-13 07:15:42,273] [INFO] [partition_parameters.py:348:__exit__] finished initializing model - num_params = 729, num_elems = 8.29B
Loading checkpoint shards: 100%|██████████| 5/5 [00:04<00:00,  1.03it/s]
Loading checkpoint shards: 100%|██████████| 5/5 [00:04<00:00,  1.02it/s]
Found 196 lora modules: ['model.layers.0.self_attn.q_proj', 'model.layers.0.self_attn.k_proj', 'model.layers.0.self_attn.v_proj', 'model.layers.0.self_attn.o_proj', 'model.layers.0.mlp.gate_proj', 'model.layers.0.mlp.up_proj', 'model.layers.0.mlp.down_proj', 'model.layers.1.self_attn.q_proj', 'model.layers.1.self_attn.k_proj', 'model.layers.1.self_attn.v_proj', 'model.layers.1.self_attn.o_proj', 'model.layers.1.mlp.gate_proj', 'model.layers.1.mlp.up_proj', 'model.layers.1.mlp.down_proj', 'model.layers.2.self_attn.q_proj', 'model.layers.2.self_attn.k_proj', 'model.layers.2.self_attn.v_proj', 'model.layers.2.self_attn.o_proj', 'model.layers.2.mlp.gate_proj', 'model.layers.2.mlp.up_proj', 'model.layers.2.mlp.down_proj', 'model.layers.3.self_attn.q_proj', 'model.layers.3.self_attn.k_proj', 'model.layers.3.self_attn.v_proj', 'model.layers.3.self_attn.o_proj', 'model.layers.3.mlp.gate_proj', 'model.layers.3.mlp.up_proj', 'model.layers.3.mlp.down_proj', 'model.layers.4.self_attn.q_proj', 'model.layers.4.self_attn.k_proj', 'model.layers.4.self_attn.v_proj', 'model.layers.4.self_attn.o_proj', 'model.layers.4.mlp.gate_proj', 'model.layers.4.mlp.up_proj', 'model.layers.4.mlp.down_proj', 'model.layers.5.self_attn.q_proj', 'model.layers.5.self_attn.k_proj', 'model.layers.5.self_attn.v_proj', 'model.layers.5.self_attn.o_proj', 'model.layers.5.mlp.gate_proj', 'model.layers.5.mlp.up_proj', 'model.layers.5.mlp.down_proj', 'model.layers.6.self_attn.q_proj', 'model.layers.6.self_attn.k_proj', 'model.layers.6.self_attn.v_proj', 'model.layers.6.self_attn.o_proj', 'model.layers.6.mlp.gate_proj', 'model.layers.6.mlp.up_proj', 'model.layers.6.mlp.down_proj', 'model.layers.7.self_attn.q_proj', 'model.layers.7.self_attn.k_proj', 'model.layers.7.self_attn.v_proj', 'model.layers.7.self_attn.o_proj', 'model.layers.7.mlp.gate_proj', 'model.layers.7.mlp.up_proj', 'model.layers.7.mlp.down_proj', 'model.layers.8.self_attn.q_proj', 'model.layers.8.self_attn.k_proj', 'model.layers.8.self_attn.v_proj', 'model.layers.8.self_attn.o_proj', 'model.layers.8.mlp.gate_proj', 'model.layers.8.mlp.up_proj', 'model.layers.8.mlp.down_proj', 'model.layers.9.self_attn.q_proj', 'model.layers.9.self_attn.k_proj', 'model.layers.9.self_attn.v_proj', 'model.layers.9.self_attn.o_proj', 'model.layers.9.mlp.gate_proj', 'model.layers.9.mlp.up_proj', 'model.layers.9.mlp.down_proj', 'model.layers.10.self_attn.q_proj', 'model.layers.10.self_attn.k_proj', 'model.layers.10.self_attn.v_proj', 'model.layers.10.self_attn.o_proj', 'model.layers.10.mlp.gate_proj', 'model.layers.10.mlp.up_proj', 'model.layers.10.mlp.down_proj', 'model.layers.11.self_attn.q_proj', 'model.layers.11.self_attn.k_proj', 'model.layers.11.self_attn.v_proj', 'model.layers.11.self_attn.o_proj', 'model.layers.11.mlp.gate_proj', 'model.layers.11.mlp.up_proj', 'model.layers.11.mlp.down_proj', 'model.layers.12.self_attn.q_proj', 'model.layers.12.self_attn.k_proj', 'model.layers.12.self_attn.v_proj', 'model.layers.12.self_attn.o_proj', 'model.layers.12.mlp.gate_proj', 'model.layers.12.mlp.up_proj', 'model.layers.12.mlp.down_proj', 'model.layers.13.self_attn.q_proj', 'model.layers.13.self_attn.k_proj', 'model.layers.13.self_attn.v_proj', 'model.layers.13.self_attn.o_proj', 'model.layers.13.mlp.gate_proj', 'model.layers.13.mlp.up_proj', 'model.layers.13.mlp.down_proj', 'model.layers.14.self_attn.q_proj', 'model.layers.14.self_attn.k_proj', 'model.layers.14.self_attn.v_proj', 'model.layers.14.self_attn.o_proj', 'model.layers.14.mlp.gate_proj', 'model.layers.14.mlp.up_proj', 'model.layers.14.mlp.down_proj', 'model.layers.15.self_attn.q_proj', 'model.layers.15.self_attn.k_proj', 'model.layers.15.self_attn.v_proj', 'model.layers.15.self_attn.o_proj', 'model.layers.15.mlp.gate_proj', 'model.layers.15.mlp.up_proj', 'model.layers.15.mlp.down_proj', 'model.layers.16.self_attn.q_proj', 'model.layers.16.self_attn.k_proj', 'model.layers.16.self_attn.v_proj', 'model.layers.16.self_attn.o_proj', 'model.layers.16.mlp.gate_proj', 'model.layers.16.mlp.up_proj', 'model.layers.16.mlp.down_proj', 'model.layers.17.self_attn.q_proj', 'model.layers.17.self_attn.k_proj', 'model.layers.17.self_attn.v_proj', 'model.layers.17.self_attn.o_proj', 'model.layers.17.mlp.gate_proj', 'model.layers.17.mlp.up_proj', 'model.layers.17.mlp.down_proj', 'model.layers.18.self_attn.q_proj', 'model.layers.18.self_attn.k_proj', 'model.layers.18.self_attn.v_proj', 'model.layers.18.self_attn.o_proj', 'model.layers.18.mlp.gate_proj', 'model.layers.18.mlp.up_proj', 'model.layers.18.mlp.down_proj', 'model.layers.19.self_attn.q_proj', 'model.layers.19.self_attn.k_proj', 'model.layers.19.self_attn.v_proj', 'model.layers.19.self_attn.o_proj', 'model.layers.19.mlp.gate_proj', 'model.layers.19.mlp.up_proj', 'model.layers.19.mlp.down_proj', 'model.layers.20.self_attn.q_proj', 'model.layers.20.self_attn.k_proj', 'model.layers.20.self_attn.v_proj', 'model.layers.20.self_attn.o_proj', 'model.layers.20.mlp.gate_proj', 'model.layers.20.mlp.up_proj', 'model.layers.20.mlp.down_proj', 'model.layers.21.self_attn.q_proj', 'model.layers.21.self_attn.k_proj', 'model.layers.21.self_attn.v_proj', 'model.layers.21.self_attn.o_proj', 'model.layers.21.mlp.gate_proj', 'model.layers.21.mlp.up_proj', 'model.layers.21.mlp.down_proj', 'model.layers.22.self_attn.q_proj', 'model.layers.22.self_attn.k_proj', 'model.layers.22.self_attn.v_proj', 'model.layers.22.self_attn.o_proj', 'model.layers.22.mlp.gate_proj', 'model.layers.22.mlp.up_proj', 'model.layers.22.mlp.down_proj', 'model.layers.23.self_attn.q_proj', 'model.layers.23.self_attn.k_proj', 'model.layers.23.self_attn.v_proj', 'model.layers.23.self_attn.o_proj', 'model.layers.23.mlp.gate_proj', 'model.layers.23.mlp.up_proj', 'model.layers.23.mlp.down_proj', 'model.layers.24.self_attn.q_proj', 'model.layers.24.self_attn.k_proj', 'model.layers.24.self_attn.v_proj', 'model.layers.24.self_attn.o_proj', 'model.layers.24.mlp.gate_proj', 'model.layers.24.mlp.up_proj', 'model.layers.24.mlp.down_proj', 'model.layers.25.self_attn.q_proj', 'model.layers.25.self_attn.k_proj', 'model.layers.25.self_attn.v_proj', 'model.layers.25.self_attn.o_proj', 'model.layers.25.mlp.gate_proj', 'model.layers.25.mlp.up_proj', 'model.layers.25.mlp.down_proj', 'model.layers.26.self_attn.q_proj', 'model.layers.26.self_attn.k_proj', 'model.layers.26.self_attn.v_proj', 'model.layers.26.self_attn.o_proj', 'model.layers.26.mlp.gate_proj', 'model.layers.26.mlp.up_proj', 'model.layers.26.mlp.down_proj', 'model.layers.27.self_attn.q_proj', 'model.layers.27.self_attn.k_proj', 'model.layers.27.self_attn.v_proj', 'model.layers.27.self_attn.o_proj', 'model.layers.27.mlp.gate_proj', 'model.layers.27.mlp.up_proj', 'model.layers.27.mlp.down_proj']
Adding LoRA to the model...
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
Detected kernel version 4.18.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
No label_names provided for model class `PeftModel`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
No label_names provided for model class `PeftModel`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
Using /home/opc/.cache/torch_extensions/py311_cu128 as PyTorch extensions root...
Creating extension directory /home/opc/.cache/torch_extensions/py311_cu128/cpu_adam...
Using /home/opc/.cache/torch_extensions/py311_cu128 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/opc/.cache/torch_extensions/py311_cu128/cpu_adam/build.ninja...
/opt/conda/lib/python3.11/site-packages/torch/utils/cpp_extension.py:2356: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
  warnings.warn(
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -I/opt/conda/lib/python3.11/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/lib/python3.11/site-packages/torch/include -isystem /opt/conda/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/usr/local/cuda/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX512__ -D__ENABLE_CUDA__ -DBF16_AVAILABLE -c /opt/conda/lib/python3.11/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o
[2/3] c++ -MMD -MF cpu_adam_impl.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -I/opt/conda/lib/python3.11/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/lib/python3.11/site-packages/torch/include -isystem /opt/conda/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/usr/local/cuda/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX512__ -D__ENABLE_CUDA__ -DBF16_AVAILABLE -c /opt/conda/lib/python3.11/site-packages/deepspeed/ops/csrc/adam/cpu_adam_impl.cpp -o cpu_adam_impl.o
[3/3] c++ cpu_adam.o cpu_adam_impl.o -shared -lcurand -L/usr/local/cuda/lib64 -L/opt/conda/lib/python3.11/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o cpu_adam.so
Loading extension module cpu_adam...
Time to load cpu_adam op: 28.119180917739868 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 28.1254723072052 seconds
Parameter Offload: Total persistent parameters: 2683904 in 424 params
{'loss': 5.3339, 'grad_norm': 5.158658504486084, 'learning_rate': 1.4285714285714285e-05, 'epoch': 0.0}
{'loss': 5.4509, 'grad_norm': 4.986435413360596, 'learning_rate': 2.857142857142857e-05, 'epoch': 0.01}

GPUの状態

2GPUで動作していることがわかります

Tue May 13 07:18:17 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.12              Driver Version: 550.90.12	   CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A10                     On  |   00000000:00:04.0 Off |                  Off |
|  0%   63C    P0            146W /  150W |   12222MiB /  24564MiB |	100%	  Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A10                     On  |   00000000:00:05.0 Off |			0 |
|  0%   62C    P0            148W /  150W |   12008MiB /  23028MiB |	100%	  Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage	  |
|=========================================================================================|
|    0   N/A  N/A     30445      C   /opt/conda/bin/python3.11                   12178MiB |
|    1   N/A  N/A     30446      C   /opt/conda/bin/python3.11                   11964MiB |
+-----------------------------------------------------------------------------------------+

最後に

まずは学習スクリプトが流れること目指して実行したため、学習結果の確認や効果の確認まではできておりませんのでご注意ください。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up