Windows 11 + Blackwell (sm_120) + Conda 環境構築ガイド: FramePack + PyTorch Nightly (CUDA 12.8)

Posted at 2025-05-02

Windows 11 × GeForce (sm 120) × Conda × PowerShell: FramePack 環境構築 & 高速化・自動化ガイド

このガイドは、Blackwell 世代 GPU (RTX 5090 / 5080 など、Compute Capability 12.0) を搭載した Windows 11 環境で、PowerShell と Conda を使用して FramePack をセットアップし、さらに xFormers と Flash-Attention を用いて高速化、最終的に動画生成を自動化する手順をまとめたものです。

対象ユーザー:

Blackwell 世代 GPU (sm 120) 搭載 PC をお使いの方
Windows 11 ユーザー
PowerShell ユーザー

目標:

CUDA 12.8 固定の PyTorch Nightly + FramePack を動作させる
xFormers と Flash-Attention をビルドして FramePack を高速化する
JSON ファイルに基づいて複数の動画を自動生成するスクリプトを実行する

前提条件:

Windows 11 (22H2/23H2 64-bit)
Blackwell 世代 GPU (RTX 5090 / 5080 等)
Git がインストールされていること

全体の流れ:

GPU ドライバ & CUDA 12.8 Toolkit のインストール
Miniconda の導入と PowerShell 設定
Conda 環境 framepack の作成と設定
NVCC 12.8 (CUDA Compiler) の導入
PyTorch Nightly (CUDA 12.8 版) のインストール
FramePack 本体のインストール
高速化拡張 (xFormers & Flash-Attention) のビルド
- ビルド前の準備 (長いパス有効化、依存関係)
- ソースコード取得とサブモジュール更新
- アーキテクチャ変数の設定
- *.cmake ファイルの編集 (必要な場合)
- ビルドオプション (Flash-Attention 有効/無効)
- ビルド実行と確認
FramePack 自動実行スクリプトの準備
- framepack_worker.py の作成
- generate_dance_videos.py の作成
FramePack の実行 (デモ & 自動生成)
トラブルシューティング

1. GPU ドライバ & CUDA 12.8 Toolkit のインストール

1-1. 既存 NVIDIA コンポーネントのアンインストール (推奨)

古いドライバや Toolkit が問題を起こすことがあるため、可能であればコントロールパネルから既存の NVIDIA 関連ソフトウェアをアンインストールします。

1-2. 公式インストーラで一括導入

NVIDIA 公式サイトから CUDA Toolkit 12.8.1 Network Installer をダウンロードします。
インストーラを実行します。
インストールオプションで「カスタム (詳細)」を選択します。
コンポーネント選択画面で、最新の Developer Driver と CUDA Toolkit 12.8 を選択します。(他のコンポーネントは任意)
「クリーンインストールの実行」チェックボックスをオンにしてインストールを進めます。
インストール完了後、PC を再起動します。

1-3. 確認

PowerShell を開き、以下のコマンドを実行してバージョンを確認します。

nvcc --version
# 出力例: release 12.8, V12.8.x

nvidia-smi
# 出力例: Driver Version: 552.xx 以上, CUDA Version: 12.8

２. Miniconda の導入と PowerShell 設定

2-1. Miniconda インストール

Miniconda documentation にアクセスし、「Windows Installers」セクションから Miniconda3 Windows 64-bit をダウンロードします。
インストーラを実行します。
インストールオプションで「Add Miniconda3 to my PATH environment variable」にチェックを入れます。(非推奨と表示されますが、今回の手順では PATH を通す方が便利です)

2-2. PowerShell で Conda を使えるようにする

PowerShell を開き、以下のコマンドを実行します。

PowerShell

標準の PowerShell 5.1 を使う場合

conda init powershell

PowerShell 7 (pwsh.exe) を主に使う場合

conda init powershell --user

次に、スクリプト実行ポリシーを変更します。

PowerShell

Set-ExecutionPolicy -Scope CurrentUser RemoteSigned -Force

重要: すべての PowerShell ウィンドウを一度閉じ、再度 PowerShell を開いて設定を反映させます。プロンプトの先頭に (base) と表示されていれば OK です。

３. Conda 環境 framepack の作成と設定
3-1. 環境作成と有効化

PowerShell

conda create -n framepack python=3.11 -y
conda activate framepack
以降のコマンドは、プロンプトに (framepack) と表示されていることを確認して実行してください。

3-2. 追加チャンネルの設定

必要なパッケージを取得するために、以下のチャンネルを追加します。

PowerShell

conda config --env --add channels conda-forge
conda config --env --add channels pytorch-nightly
conda config --env --add channels nvidia

４. NVCC 12.8 (CUDA Compiler) の導入
システムにインストールした CUDA Toolkit とは別に、Conda 環境内にも互換性のある NVCC を導入します。これにより、環境ごとの CUDA バージョン管理が容易になります。

PowerShell

conda install -y cuda-version=12.8 cuda-nvcc_win-64=12.8.*

確認:

PowerShell

# (framepack) 環境で実行
nvcc --version
# 出力例: release 12.8, V12.8.x (Conda 環境内のものが優先されるはず)
５. PyTorch Nightly (CUDA 12.8 版) のインストール
Blackwell (sm 120) サポートが含まれる PyTorch Nightly ビルドをインストールします。

PowerShell

python -m pip install --upgrade pip
python -m pip install --pre torch torchvision torchaudio --index-url [https://download.pytorch.org/whl/nightly/cu128]

テスト:

Python インタプリタを起動して確認します。

Python

import torch

print(torch.version.cuda)
# 出力: 12.8

print(torch.cuda.is_available())
# 出力: True

print(torch.cuda.get_device_capability())
# 出力: (12, 0)  <- sm_120 の意味

print(torch.cuda.get_arch_list())
# 出力例: ['sm_120'] や ['compute_120', 'sm_120']

６. FramePack 本体のインストール

PowerShell

# 任意の作業ディレクトリに移動 (例: C:\Users\YourUser\)
git clone [https://github.com/lllyasviel/FramePack.git](https://github.com/lllyasviel/FramePack.git)
cd FramePack
python -m pip install -r requirements.txt

７. 高速化拡張 (xFormers & Flash-Attention) のビルド
7-1. ビルド前の準備

長いパスの有効化: Flash-Attention のサブモジュールは深い階層構造を持つため、Windows の 260 文字パス制限を解除する必要があります。

PowerShell

# 管理者権限で PowerShell を起動して実行
reg add "HKLM\System\CurrentControlSet\Control\FileSystem" /v LongPathsEnabled /t REG_DWORD /d 1 /f
git config --global core.longpaths true

設定を有効にするために PC を再起動します。

依存関係のインストール: ビルドに必要な ninja をインストールします。

PowerShell

# (framepack) 環境で実行
conda activate framepack
pip install ninja

7-2. ソースコード取得とサブモジュール更新

PowerShell

# FramePack ディレクトリ内に xformers をクローンする場合
# cd C:\Users\YourUser\FramePack # FramePack をクローンした場所に適宜変更
# 現在地が FramePack ディレクトリであると仮定
git clone [https://github.com/facebookresearch/xformers.git](https://github.com/facebookresearch/xformers.git)
cd xformers
git submodule update --init --recursive

7-3. アーキテクチャ変数の設定

ビルド対象の GPU アーキテクチャを指定します。Blackwell 世代は 12.0 です。この環境変数は xFormers と Flash-Attention の両方のビルドで必要になります。

PowerShell

# 現在の PowerShell セッションでのみ有効
set TORCH_CUDA_ARCH_LIST=12.0

# システム環境変数として恒久的に設定する場合 (要再起動 or 新しいターミナル)
# setx TORCH_CUDA_ARCH_LIST "12.0;12.0+PTX"

💡 注意: setx で設定した場合、PowerShell を再起動しないと set コマンドで設定した値が優先されることがあります。ビルド直前に set TORCH_CUDA_ARCH_LIST=12.0 を実行するのが確実です。

7-4. *.cmake ファイルの編集 (xFormers のバージョンによる)

xFormers のバージョンによっては、ソースコード内の CMake ファイルにビルド対象アーキテクチャがハードコードされている場合があります。その場合、sm_120 を追加する必要があります。

対象ファイルの特定:
xformers ディレクトリ内で、アーキテクチャリストが定義されていそうな CMake ファイルを探します。

PowerShell

# cd C:\Users\YourUser\FramePack\xformers # 必要に応じて xformers ディレクトリに移動
Get-ChildItem -Path . -Recurse -Filter "*arch*.cmake" | Select-Object FullName

よくあるファイル名:

cmake/select_compute_arch.cmake (古いバージョン)
cmake/xformers_cuda.cmake (新しいバージョン)

ファイルの確認と編集 (必要な場合のみ):
特定したファイルを開き、set(ARCH_LIST ...) や set(TORCH_CUDA_ARCH_LIST ...) のような行を探します。もし 90 までしか記述がなく、120 が含まれていない場合は、以下のように PowerShell で置換します。

PowerShell

# 例: cmake\xformers_cuda.cmake が対象ファイルだった場合
$path = "cmake\xformers_cuda.cmake" # 見つかったパスに置き換える
(Get-Content $path -Raw) -replace "'70;75;80;86;90'", "'70;75;80;86;90;120'" | Set-Content $path -NoNewline

Get-Content を () で囲むのがポイントです。
置換対象の文字列 '70;...' はファイルの内容に合わせて調整してください。
💡 Tip: 最新の xFormers では、環境変数 TORCH_CUDA_ARCH_LIST を直接参照するため、このファイル編集が不要になっている可能性があります。まずは編集せずにビルドを試行し、Automatically detected arch list: 12.0のようなログが出れば編集は不要です。no kernel image エラーが出た場合は、このファイル編集が必要な可能性が高いです。

7-5. ビルドオプション (Flash-Attention 有効/無効)

オプション A: Flash-Attention を含めてビルド (推奨: VRAM 24GB 以上)
特別な設定は不要です。

オプション B: Flash-Attention を無効にしてビルド (VRAM が少ない場合やビルド時間短縮)
環境変数を設定します。

PowerShell

set XFORMERS_DISABLE_FLASH_ATTN=1

7-6. ビルド実行と確認

xFormers の依存関係をインストールし、ビルドを実行します。

PowerShell

# cd C:\Users\YourUser\FramePack\xformers # xformers ディレクトリにいることを確認
set TORCH_CUDA_ARCH_LIST=12.0 # 念のため再設定
pip install -r requirements.txt
pip install .

ビルドには時間がかかります。完了後、xformers ディレクトリから一つ上の階層 (FramePack ディレクトリ) に戻ります。

PowerShell

cd ..

7-7. Flash-Attention のビルド (オプション)

もし xFormers で Flash-Attention を有効にした場合、flash-attention 自体も個別にビルド・インストールしておくと、他のライブラリとの連携で役立つ場合があります (FramePack 自体は xFormers 経由で利用します)。

PowerShell

# 現在地が FramePack ディレクトリであると仮定
git clone [https://github.com/Dao-AILab/flash-attention](https://github.com/Dao-AILab/flash-attention)
cd flash-attention
set TORCH_CUDA_ARCH_LIST=12.0 # ここでも必要
python setup.py bdist_wheel
# dist ディレクトリに生成された .whl ファイルをインストール
pip install dist\flash_attn*.whl
cd ..

7-8. ビルド確認

Python で xFormers が正しくインストールされ、CUDA バージョンと互換性があるか確認します。

PowerShell

# 現在地が FramePack ディレクトリであると仮定
python - <<'PY'
import torch
import xformers
import xformers.ops as xops

print(f"Torch CUDA: {torch.version.cuda}")
print(f"GPU arch: {torch.cuda.get_device_capability()}")
print(f"xformers: {xformers.__version__}")
try:
    # Create dummy tensors on GPU with float16 for attention check
    q = torch.randn(1, 1, 8, 32, device='cuda', dtype=torch.float16)
    k = torch.randn(1, 1, 8, 32, device='cuda', dtype=torch.float16)
    v = torch.randn(1, 1, 8, 32, device='cuda', dtype=torch.float16)
    out = xops.memory_efficient_attention(q, k, v)
    print("xformers memory efficient attention: OK")
except Exception as e:
    print(f"xformers memory efficient attention: FAILED ({e})")

期待される出力例:

YAML

Torch CUDA: 12.8
GPU arch: (12, 0)
xformers: 0.0.26+xxxxxxxx # バージョンは変動します
xformers memory efficient attention: OK

８. FramePack 自動実行スクリプトの準備
オリジナルの demo_gradio.py をベースに、複数の動画を連続生成するためのスクリプトを作成します。

8-1. framepack_worker.py の作成
demo_gradio.py の主要な処理を関数化し、再利用可能なモジュールを作成します。FramePack ディレクトリ内に framepack_worker.py という名前で以下の内容を保存します。

&lt;details>
&lt;summary>&lt;code>framepack_worker.py&lt;/code> コード (クリックで展開)&lt;/summary>

Python

# framepack_worker.py

import os
import sys
import torch
import random
import numpy as np
from PIL import Image
from tqdm import tqdm
from einops import rearrange
from pathlib import Path
import math
import time
import traceback

# HF_HOME 環境変数を設定 (キャッシュディレクトリを指定したい場合)
# os.environ['HF_HOME'] = 'huggingface_cache'

# --- 必要ライブラリのインポート ---
try:
    from pipelines.pipeline_framepack import FramePackPipeline
    from models.unet_3d_condition import UNet3DConditionModel
    from models.transformer import TransformerWrapper
    from models.pipeline_utils import get_quant_config, get_transformer_map
    from diffusers import AutoencoderKL
    from transformers import CLIPTextModel, CLIPTokenizer, CLIPVisionModelWithProjection, CLIPImageProcessor
except ImportError as e:
    print(f"Error importing necessary libraries: {e}")
    print("Please ensure you have installed all requirements from requirements.txt in the FramePack directory.")
    sys.exit(1)

# --- モデル初期化関数 ---
def init_models(model_dir="checkpoints/stable-video-diffusion-img2vid-xt-1-1",
                device="cuda",
                dtype=torch.float16,
                low_vram_mode=False): # low_vram_mode は VRAM 24GB未満の場合 True を推奨

    print(f"Initializing models from {model_dir}...")
    if not Path(model_dir).exists():
         print(f"[ERROR] Model directory not found: {model_dir}")
         print("Please download the SVD checkpoints and place them in the specified directory.")
         sys.exit(1)

    transformer_map = get_transformer_map() # この変数は現状使われていないが将来のため残す
    transformer_model_path = os.path.join(model_dir, 'transformer')
    unet_model_path = os.path.join(model_dir, 'unet')
    vae_model_path = os.path.join(model_dir, 'vae')
    image_encoder_path = os.path.join(model_dir, 'image_encoder')
    text_encoder_path = os.path.join(model_dir, 'text_encoder')
    text_encoder_2_path = os.path.join(model_dir, 'text_encoder_2')

    pipeline = None # エラー時に None を返すための初期化

    try:
        # Load models
        print("Loading VAE...")
        vae = AutoencoderKL.from_pretrained(vae_model_path, torch_dtype=dtype)
        print("Loading Image Encoder...")
        feature_extractor = CLIPImageProcessor.from_pretrained(image_encoder_path)
        image_encoder = CLIPVisionModelWithProjection.from_pretrained(image_encoder_path, torch_dtype=dtype)
        print("Loading Text Encoder 1...")
        tokenizer = CLIPTokenizer.from_pretrained(text_encoder_path)
        text_encoder = CLIPTextModel.from_pretrained(text_encoder_path, torch_dtype=dtype)
        print("Loading Text Encoder 2...")
        tokenizer_2 = CLIPTokenizer.from_pretrained(text_encoder_2_path)
        text_encoder_2 = CLIPTextModel.from_pretrained(text_encoder_2_path, torch_dtype=dtype)
        print("Loading UNet...")
        unet = UNet3DConditionModel.from_pretrained(unet_model_path, torch_dtype=dtype)
        print("Loading Transformer...")
        transformer = TransformerWrapper.from_pretrained(transformer_model_path, unet_config=unet.config, low_vram_mode=low_vram_mode, torch_dtype=dtype)

        # Move to device
        vae = vae.to(device).eval()
        image_encoder = image_encoder.to(device).eval()
        text_encoder = text_encoder.to(device).eval()
        text_encoder_2 = text_encoder_2.to(device).eval()
        unet.to(device).eval()
        transformer.to(device).eval()

        # Check VRAM and determine high_vram flag
        high_vram = False
        if device == "cuda" and torch.cuda.is_available():
            try:
                gpu_properties = torch.cuda.get_device_properties(torch.cuda.current_device())
                total_memory_gb = gpu_properties.total_memory / (1024 ** 3)
                print(f"Detected GPU: {gpu_properties.name} with {total_memory_gb:.2f} GB VRAM")
                if total_memory_gb >= 20: # 20GB以上を目安とする
                    high_vram = True
            except Exception as e:
                print(f"Could not detect GPU VRAM: {e}")
        elif device != "cuda":
             print("Running on CPU or other device, high_vram set to False.")
        else:
             print("CUDA not available, high_vram set to False.")

        print(f"High VRAM mode determined: {high_vram}")

        # Create pipeline (do this *after* moving models to device)
        pipeline = FramePackPipeline(
            vae=vae,
            text_encoder=text_encoder, text_encoder_2=text_encoder_2,
            tokenizer=tokenizer, tokenizer_2=tokenizer_2,
            unet=unet, transformer=transformer,
            feature_extractor=feature_extractor, image_encoder=image_encoder
        )
        # pipeline.enable_model_cpu_offload() # 必要に応じて有効化

        print("Models initialized successfully.")
        return high_vram, text_encoder, text_encoder_2, tokenizer, tokenizer_2, vae, feature_extractor, image_encoder, transformer, pipeline

    except Exception as e:
        print(f"[ERROR] Failed to initialize models: {e}")
        traceback.print_exc()
        # Return None for all models if initialization fails
        return False, None, None, None, None, None, None, None, None, None


# --- 動画生成ワーカー関数 ---
def worker(input_image: Image.Image,
           prompt: str,
           pipeline: FramePackPipeline, # Pass the initialized pipeline
           transformer: TransformerWrapper, # Needed for TeaCache control
           n_prompt: str = "",
           seed: int = 42,
           total_second_length: float = 6.0,
           target_fps: int = 30,
           latent_window_size: int = 32,
           steps: int = 25,
           cfg: float = 1.0,
           gs: float = 7.0,
           rs: float = 0.0,
           gpu_memory_preservation: float = 6.0, #GB
           use_teacache: bool = True,
           chunk_size: int = 16, # Smaller chunk for lower VRAM
           outputs_folder: str = "outputs",
           high_vram: bool = False):

    if pipeline is None or transformer is None:
         print("[ERROR] Pipeline or Transformer not initialized. Cannot generate video.")
         return None

    try:
        start_time = time.time()
        output_dir = Path(outputs_folder)
        output_dir.mkdir(parents=True, exist_ok=True)

        # Determine frames based on seconds and FPS
        total_frames = math.ceil(total_second_length * target_fps)
        print(f"Target: {total_second_length} seconds, {target_fps} FPS => {total_frames} frames")

        # Adjust chunk size based on VRAM if needed
        if not high_vram and chunk_size > 8:
             print(f"Low VRAM detected, reducing decode_chunk_size from {chunk_size} to 8")
             decode_chunk_size = 8
        else:
             decode_chunk_size = chunk_size

        # --- Run Pipeline ---
        generator = torch.Generator(device=pipeline.device).manual_seed(seed)

        # Enable TeaCache if requested and available
        teacache_enabled_here = False
        if use_teacache and hasattr(transformer, 'enable_teacache'):
             print("Enabling TeaCache...")
             transformer.enable_teacache(chunk_size=latent_window_size) # chunk_size for teacache might relate to latent_window
             teacache_enabled_here = True
        elif use_teacache:
             print("[WARN] TeaCache requested but 'enable_teacache' not found on transformer.")


        # Resize input image - SVD expects 1024x576
        img_w, img_h = input_image.size
        if img_w != 1024 or img_h != 576:
            print(f"Resizing input image from {img_w}x{img_h} to 1024x576")
            input_image = input_image.resize((1024, 576))

        output_video_path = pipeline(
            prompt=prompt,
            negative_prompt=n_prompt,
            image=input_image,
            num_inference_steps=steps,
            guidance_scale=gs,
            num_frames=total_frames,
            latent_window_size=latent_window_size,
            cfg_scale=cfg,
            revision_scale=rs,
            generator=generator,
            output_type="filepath", # Get filepath directly
            output_folder=str(output_dir),
            fps=target_fps,
            memory_limit_gb=gpu_memory_preservation,
            decode_chunk_size=decode_chunk_size # Use potentially adjusted chunk size
        )

        # Disable TeaCache after use if it was enabled
        if teacache_enabled_here and hasattr(transformer, 'disable_teacache'):
             print("Disabling TeaCache...")
             transformer.disable_teacache()

        end_time = time.time()
        elapsed = end_time - start_time
        print(f"Video generation took {elapsed:.2f} seconds.")
        if output_video_path and Path(output_video_path).exists():
            print(f"Saved video to: {output_video_path}")
            return output_video_path
        else:
            print("[ERROR] Pipeline finished but output video path not found or invalid.")
            return None

    except Exception as e:
        print(f"[ERROR] Error during video generation: {e}")
        traceback.print_exc()
        # Ensure teacache is disabled in case of error
        if teacache_enabled_here and hasattr(transformer, 'disable_teacache'):
            try:
                transformer.disable_teacache()
            except Exception as te:
                print(f"Error disabling teacache after exception: {te}")
        return None

# --- Self-test block ---
if __name__ == '__main__':
    print("--- Testing framepack_worker module ---")
    # --- Initialize ---
    # Set low_vram_mode=True if you have < 24GB VRAM for testing
    high_vram, text_encoder, text_encoder_2, tokenizer, tokenizer_2, vae, \
    feature_extractor, image_encoder, transformer, pipeline = init_models(low_vram_mode=False)

    if pipeline is None:
        print("[FAIL] Model initialization failed. Aborting test.")
    else:
        print("[OK] Model initialization successful.")
        # --- Prepare Inputs ---
        test_img_path = Path("test_input_worker.png")
        test_output_folder = "worker_test_output"
        # Create a dummy image if none exists
        if not test_img_path.exists():
            print(f"Creating dummy input image: {test_img_path}")
            try:
                Image.new('RGB', (1024, 576), color = 'blue').save(test_img_path)
            except Exception as e:
                print(f"[FAIL] Could not create dummy image: {e}")
                sys.exit(1)

        try:
            input_img = Image.open(test_img_path).convert("RGB")
            print(f"Loaded test image: {test_img_path}")
        except Exception as e:
            print(f"[FAIL] Could not load test image {test_img_path}: {e}")
            sys.exit(1)

        test_prompt = "a cinematic shot of a fluffy cat wearing a tiny wizard hat, exploring a mystical library"

        # --- Run Worker ---
        print("Running worker function...")
        mp4_path = worker(
            input_image=input_img,
            prompt=test_prompt,
            pipeline=pipeline, # Pass initialized pipeline
            transformer=transformer, # Pass transformer for TeaCache
            seed=12345,
            total_second_length=2.0, # Short video for testing
            target_fps=15, # Lower FPS for faster test
            steps=10, # Fewer steps for faster test
            outputs_folder=test_output_folder,
            high_vram=high_vram, # Use detected VRAM mode
            use_teacache=True # Test with TeaCache
        )

        if mp4_path and Path(mp4_path).exists():
            print(f"[PASS] Worker test successful, video saved to {mp4_path}")
        else:
            print("[FAIL] Worker test failed.")

&lt;/details>

8-2. generate_dance_videos.py の作成

JSON ファイルを読み込み、各プロンプトに対応する画像を使って動画を生成するスクリプトです。FramePack ディレクトリ内に generate_dance_videos.py という名前で保存します。また、同ディレクトリに DancePrompts.json と、連番の初期フレーム画像 (例: 1.png, 2.png, ...) を置く dance_first_frame ディレクトリが必要です。

DancePrompts.json の例:

JSON

{
  "DancePrompts": [
    {
      "Prompt Number": 1,
      "Dance Prompt": "A girl starts hip-hop dance on a vibrant street stage, dynamic moves, energetic rhythm, urban background."
    },
    {
      "Prompt Number": 2,
      "Dance Prompt": "A boy breakdancing on the street, power moves, spinning on the ground, graffiti wall background."
    },
    {
      "Prompt Number": 3,
      "Dance Prompt": "Ballet dancer performing elegant pirouettes in a classical theater setting."
    }
  ]
}

dance_first_frame ディレクトリ構造例:

Plaintext

FramePack/
│   generate_dance_videos.py
│   framepack_worker.py
│   DancePrompts.json
│   ... (other FramePack files)
│
└───dance_first_frame/
│       1.png
│       2.png
│       3.png
│       ...
│
└───outputs/  (This will be created by the script)

generate_dance_videos.py のコード:

Python

# generate_dance_videos.py
import json
import os
from pathlib import Path
from PIL import Image
import time
import sys

# --- スクリプトがあるディレクトリを基準にする ---
SCRIPT_DIR = Path(__file__).parent.resolve()

# --- 設定 ---
# 相対パスで指定
BASE_IMG_DIR = SCRIPT_DIR / "dance_first_frame"  # 入力画像フォルダ
OUTPUT_DIR   = SCRIPT_DIR / "outputs"            # 出力動画フォルダ
JSON_PATH    = SCRIPT_DIR / "DancePrompts.json" # JSONファイルパス
MODEL_DIR    = SCRIPT_DIR / "checkpoints/stable-video-diffusion-img2vid-xt-1-1" # モデルディレクトリ

TOTAL_SEC = 15.0   # 生成する動画の長さ（秒）
TARGET_FPS = 30    # 目標フレームレート
STEPS     = 25     # 推論ステップ数
LAT_WIN   = 32     # Latent Window Size
CFG       = 1.0
GS        = 10.0   # Guidance Scale
RS        = 0.0    # Revision Scale
GPU_MEM   = 6.0    # 予約するGPUメモリ(GB) - これを超えるとCPUに移される可能性
USE_TEACACHE = True # TeaCache を使うか
CHUNK_SIZE = 16     # デコード/エンコードのチャンクサイズ (低VRAMなら小さく: 8 or 4)
LOW_VRAM_MODE_INIT = False # モデル初期化時に低VRAMモードを使うか (例: < 24GBならTrue)
# --- 設定ここまで ---

# --- framepack_worker モジュールをインポート ---
try:
    # スクリプトのディレクトリをPythonのパスに追加してインポートを試みる
    sys.path.insert(0, str(SCRIPT_DIR))
    from framepack_worker import init_models, worker
except ImportError:
    print("[ERROR] framepack_worker.py not found or cannot be imported.")
    print(f"Ensure framepack_worker.py is in the directory: {SCRIPT_DIR}")
    sys.exit(1)
except Exception as e:
    print(f"[ERROR] Failed to import from framepack_worker: {e}")
    sys.exit(1)


print("--- FramePack Batch Video Generation ---")
print(f"Script directory: {SCRIPT_DIR}")
print(f"Input image directory: {BASE_IMG_DIR}")
print(f"Output directory: {OUTPUT_DIR}")
print(f"JSON path: {JSON_PATH}")
print(f"Model directory: {MODEL_DIR}")

# 1. モデルの初期化 (一度だけ実行)
print("\nInitializing models...")
init_start_time = time.time()
# モデルディレクトリを絶対パスに変換して渡す
absolute_model_dir = str(MODEL_DIR.resolve())
high_vram, text_encoder, text_encoder_2, tokenizer, tokenizer_2, vae, \
feature_extractor, image_encoder, transformer, pipeline = init_models(
    model_dir=absolute_model_dir, low_vram_mode=LOW_VRAM_MODE_INIT
)

# Check if initialization was successful
if pipeline is None:
    print("\n[FATAL] Model initialization failed. Please check logs and model paths. Exiting.")
    sys.exit(1)

init_end_time = time.time()
print(f"Model initialization took {init_end_time - init_start_time:.2f} seconds.")
print(f"High VRAM mode detected by worker: {high_vram}") # Display the detected mode

# 2. JSONデータの読み込み
if not JSON_PATH.exists():
    print(f"\n[ERROR] JSON file not found at: {JSON_PATH}")
    sys.exit(1)

print(f"\nLoading prompts from {JSON_PATH.name}...")
try:
    with open(JSON_PATH, encoding="utf-8") as f:
        table = json.load(f)["DancePrompts"]
    print(f"Found {len(table)} prompts.")
except KeyError:
     print(f"[ERROR] JSON file must contain a key named 'DancePrompts' with a list of prompts.")
     sys.exit(1)
except Exception as e:
    print(f"[ERROR] Failed to load or parse JSON: {e}")
    sys.exit(1)

# 3. 出力ディレクトリ作成
print(f"\nEnsuring output directory exists: {OUTPUT_DIR}")
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

# 4. 動画生成ループ
total_videos = len(table)
generated_count = 0
failed_count = 0
total_gen_time = 0

print("\n--- Starting Video Generation Loop ---")
for i, row in enumerate(table):
    item_start_time = time.time()
    print("-" * 60)
    print(f"Processing item {i+1}/{total_videos}")

    try:
        idx = row["Prompt Number"]
        prompt = row["Dance Prompt"]
        print(f"Prompt Number: {idx}")
        print(f"Prompt: {prompt[:80]}...") # Display first 80 chars
    except KeyError as e:
        print(f"[WARN] Skipping row {i+1} due to missing key: {e}")
        failed_count += 1
        continue

    # 画像パスを絶対パスに変換
    img_path = (BASE_IMG_DIR / f"{idx}.png").resolve()
    print(f"Looking for input image: {img_path}")

    if not img_path.exists():
        print(f"[!] Image not found, skipping.")
        failed_count += 1
        continue
    elif not img_path.is_file():
        print(f"[!] Path exists but is not a file, skipping: {img_path}")
        failed_count += 1
        continue

    try:
        input_image = Image.open(img_path).convert("RGB")
        print(f"Loaded image: {img_path.name} ({input_image.width}x{input_image.height})")
    except Exception as e:
        print(f"[!] Failed to open image {img_path}, skipping: {e}")
        failed_count += 1
        continue

    # --- worker 関数呼び出し ---
    print("Calling worker function...")
    gen_start_time = time.time()
    # 出力フォルダも絶対パスで渡す
    absolute_output_dir = str(OUTPUT_DIR.resolve())
    # Pass all necessary arguments, including the initialized pipeline and transformer
    mp4_temp_path_str = worker(
        input_image=input_image,
        prompt=prompt,
        pipeline=pipeline,          # Pass the initialized pipeline
        transformer=transformer,    # Pass the initialized transformer
        n_prompt="",                # Add negative prompt if needed
        seed=31337 + idx,           # Use a unique seed per video
        total_second_length=TOTAL_SEC,
        target_fps=TARGET_FPS,
        latent_window_size=LAT_WIN,
        steps=STEPS,
        cfg=CFG,
        gs=GS,
        rs=RS,
        gpu_memory_preservation=GPU_MEM,
        use_teacache=USE_TEACACHE,
        chunk_size=CHUNK_SIZE,
        outputs_folder=absolute_output_dir, # Pass absolute path as string
        high_vram=high_vram        # Pass the detected VRAM mode
    )
    gen_end_time = time.time()
    loop_gen_time = gen_end_time - gen_start_time
    total_gen_time += loop_gen_time

    if mp4_temp_path_str:
        mp4_temp_path = Path(mp4_temp_path_str)
        # 分かりやすいファイル名に変更 (出力ディレクトリ基準で)
        new_name = OUTPUT_DIR / f"dance_{idx}.mp4"
        try:
            # Rename the file if it exists
            if mp4_temp_path.exists() and mp4_temp_path.is_file():
                 # If the returned path is already the final name, skip renaming
                 if mp4_temp_path.resolve() != new_name.resolve():
                     mp4_temp_path.rename(new_name)
                     print(f" → Renamed to {new_name.name}")
                 else:
                     print(f" → Saved as {new_name.name} (already correct name)")
                 print(f" → Success! (Generation took {loop_gen_time:.2f} seconds)")
                 generated_count += 1
            else:
                 print(f"[!] Worker returned path but file not found or invalid: {mp4_temp_path_str}")
                 print(f" → Failed! (Generation attempt took {loop_gen_time:.2f} seconds)")
                 failed_count += 1

        except Exception as e:
            print(f"[!] Failed to rename {mp4_temp_path.name} to {new_name.name}: {e}")
            print(f" → Original file might be at: {mp4_temp_path_str}") # Show original path
            print(f" → Failed! (Generation attempt took {loop_gen_time:.2f} seconds)")
            failed_count += 1
    else:
        print(f"[!] Video generation failed for prompt {idx}.")
        print(f" → Failed! (Generation attempt took {loop_gen_time:.2f} seconds)")
        failed_count += 1

    item_end_time = time.time()
    print(f"Total time for item {idx}: {item_end_time - item_start_time:.2f} seconds")


print("-" * 60)
print("--- Generation Complete ---")
print(f"Summary:")
print(f"  Total prompts processed: {total_videos}")
print(f"  Successfully generated: {generated_count}")
print(f"  Failed/Skipped: {failed_count}")
if generated_count > 0:
      average_time = total_gen_time / generated_count
      print(f"  Average generation time per successful video: {average_time:.2f} seconds.")
print("-" * 60)

９. FramePack の実行と動作確認
9-1. デモ実行 (Gradio UI)

基本的な動作確認として、オリジナルの Gradio デモを実行します。FramePack ディレクトリで以下のコマンドを実行します。

PowerShell

# (framepack) 環境で FramePack ディレクトリに移動していることを確認
# cd C:\Path\To\Your\FramePack # 必要に応じて移動
conda activate framepack
python demo_gradio.py --inbrowser

ブラウザで UI が開かれ、画像とプロンプトを入力して動画が生成できれば基本設定は OK です。起動時のログで以下のような情報が表示されることを確認します。

# 起動ログの例
PyTorch compiled with CUDA: 12.8
GPU capability: 12.0
Using xformers: True # xFormersが有効になっているか
[...]
Running on local URL:  [http://127.0.0.1:7860](http://127.0.0.1:7860)

9-2. 自動生成スクリプトの実行

準備した generate_dance_videos.py を実行します。FramePack ディレクトリで以下のコマンドを実行します。

PowerShell

# (framepack) 環境で FramePack ディレクトリに移動していることを確認
# cd C:\Path\To\Your\FramePack # 必要に応じて移動
conda activate framepack
python generate_dance_videos.py

コンソールに進捗が表示され、完了すると outputs ディレクトリ (またはスクリプト内で指定した場所) に dance_1.mp4, dance_2.mp4 ... といった動画ファイルが生成されます。

成果物例:

RTX 5090 24GB + 高速 SSD 環境の場合、15秒 (450フレーム) の動画 1 本あたり約 40～50 秒程度で生成されることが期待されます。出力される動画は、デフォルトで 1024x576 @ 30fps の H.264 mp4 ファイルです。

おわりに

これで Windows 11 と最新の GeForce GPU (Blackwell 世代) 環境で、CUDA 12.8 に最適化された PyTorch Nightly と FramePack、さらに高速化のための xFormers (Flash-Attention 含む) の環境構築が完了し、動画の自動生成まで行えるようになりました。

このガイドが、あなたの FramePack を使った動画生成・創作活動の一助となれば幸いです。新世代 GPU のパワーを最大限に活用し、新たな表現の世界を探求してください！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up