CogVideoX をMacで動かす

CogVideoX

Last updated at 2025-07-23Posted at 2025-07-09

前提条件

macOS 10.15以降
Apple Silicon Mac

Step 1: Python環境のセットアップ

1.1 Homebrewのインストール

# Homebrewをインストール
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

1.2 Python 3.10のインストール

# Python 3.10をインストール
brew install python@3.10

# バージョン確認
python3.10 --version

1.3 仮想環境の作成

# プロジェクトディレクトリを作成
mkdir ~/cogvideox-project
cd ~/cogvideox-project

# Python仮想環境を作成
python3.10 -m venv cogvideo_env

# 仮想環境を有効化
source cogvideo_env/bin/activate

# pipをアップグレード
pip install --upgrade pip

Step 2: ComfyUIのインストール

2.1 ComfyUIのクローン

# ComfyUIをクローン
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI

# 依存関係をインストール
pip install -r requirements.txt

2.2 追加の依存関係

# Mac向けの追加パッケージをインストール
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install accelerate diffusers transformers

Step 3: CogVideoXラッパーのインストール

3.1 CogVideoX Wrapperのクローン

# ComfyUIのcustom_nodesディレクトリに移動
cd custom_nodes

# CogVideoX Wrapperをクローン
git clone https://github.com/kijai/ComfyUI-CogVideoXWrapper.git

# 依存関係をインストール
cd ComfyUI-CogVideoXWrapper
pip install -r requirements.txt

3.2 FFmpegのインストール

# FFmpegをインストール（動画出力に必要）
brew install ffmpeg

# または、imageio-ffmpegでも可
pip install imageio-ffmpeg

Step 4: 必要なモデルのダウンロード

4.1 T5テキストエンコーダーのダウンロード

# ComfyUIのmodels/clipディレクトリに移動
cd ~/cogvideox-project/ComfyUI/models/clip

# T5テキストエンコーダーをダウンロード
curl -L "https://huggingface.co/google/t5-v1_1-xxl/resolve/main/model.safetensors" -o t5xxl_fp16.safetensors

Step 5: ComfyUIの起動

5.1 ComfyUIサーバーの起動

# ComfyUIのルートディレクトリに移動
cd ~/cogvideox-project/ComfyUI

# 仮想環境を有効化（まだの場合）
source ../cogvideo_env/bin/activate

# ComfyUIを起動
python main.py --listen 127.0.0.1 --port 8188

5.2 ブラウザでアクセス

ブラウザで http://127.0.0.1:8188 にアクセスします。

ComfyUI-Managerをインストール

cd custom_nodes

# 既存のManagerフォルダがある場合は削除
rm -rf ComfyUI-Manager

# 最新のComfyUI-Managerをクローン
git clone https://github.com/ltdrdata/ComfyUI-Manager.git

# インストール確認
ls -la ComfyUI-Manager/

cd ComfyUI-Manager

# 依存関係があればインストール（通常は不要）
if [ -f requirements.txt ]; then
    pip install -r requirements.txt
fi

改めて起動すると右上にmanagerが表示されます

Step 6: 猫の動画生成ワークフロー

6.1 ワークフローのセットアップ

ComfyUI Manager > Install Missing Custom Nodes を実行
CogVideoX関連のノードをインストール
ComfyUIを再起動

6.2 基本ワークフロー（JSON設定）

以下のワークフローをComfyUIにドラッグ&ドロップします：

{
  "1": {
    "inputs": {
      "text": "A cute orange tabby cat playing with a ball of yarn in a sunny living room, 3 seconds, high quality, detailed",
      "clip": ["2", 0]
    },
    "class_type": "CLIPTextEncode",
    "_meta": {
      "title": "CLIP Text Encode (Prompt)"
    }
  },
  "2": {
    "inputs": {
      "ckpt_name": "t5xxl_fp16.safetensors"
    },
    "class_type": "CheckpointLoaderSimple",
    "_meta": {
      "title": "Load Checkpoint"
    }
  },
  "3": {
    "inputs": {
      "model_name": "CogVideoX-2b",
      "fp8_transformer": "disabled",
      "load_device": "auto"
    },
    "class_type": "DownloadAndLoadCogVideoModel",
    "_meta": {
      "title": "Download CogVideo Model"
    }
  },
  "4": {
    "inputs": {
      "model": ["3", 0],
      "positive": ["1", 0],
      "negative": ["5", 0],
      "num_frames": 25,
      "steps": 20,
      "cfg": 7.0,
      "denoise_strength": 1.0,
      "scheduler": "CogVideoXDDIM"
    },
    "class_type": "CogVideoSampler",
    "_meta": {
      "title": "CogVideo Sampler"
    }
  },
  "5": {
    "inputs": {
      "text": "blurry, low quality, pixelated, distorted",
      "clip": ["2", 0]
    },
    "class_type": "CLIPTextEncode",
    "_meta": {
      "title": "CLIP Text Encode (Negative)"
    }
  },
  "6": {
    "inputs": {
      "samples": ["4", 0],
      "vae": ["3", 2]
    },
    "class_type": "VAEDecode",
    "_meta": {
      "title": "VAE Decode"
    }
  },
  "7": {
    "inputs": {
      "images": ["6", 0],
      "filename_prefix": "cat_video",
      "fps": 8,
      "compress_level": 4
    },
    "class_type": "VHS_VideoCombine",
    "_meta": {
      "title": "Video Combine"
    }
  }
}

Step 7: 動画生成の実行

7.1 プロンプトの設定

「Queue Prompt」ボタンをクリックして生成を開始します。

推奨プロンプト例：

"A cute orange tabby cat playing with a ball of yarn, 3 seconds"
"A fluffy white cat sleeping peacefully on a windowsill, 3 seconds"
"A playful kitten chasing its tail in a garden, 3 seconds"

7.2 生成設定

フレーム数: 25フレーム（約3秒 @ 8fps）
ステップ数: 20（品質と速度のバランス）
CFGスケール: 7.0（プロンプト遵守度）

Step 8: 最適化とトラブルシューティング

8.1 メモリ最適化

# メモリ不足の場合、以下の設定を追加
export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0

8.2 よくある問題と解決法

問題1: FFmpegエラー

# 解決法
brew reinstall ffmpeg
pip install imageio-ffmpeg

問題2: モデルダウンロードの失敗

# 解決法：手動でHugging Faceからダウンロード
cd ~/cogvideox-project/ComfyUI/models/CogVideo
# ブラウザでhttps://huggingface.co/THUDM/CogVideoX-2b からモデルファイルをダウンロード

問題3: VRAM不足

CogVideoX-2Bモデルを使用（より軽量）
フレーム数を減らす（15-20フレーム）
バッチサイズを1に設定

8.3 パフォーマンス向上

# より高速な推論のため
pip install xformers  # 利用可能な場合

Step 9: 出力とカスタマイズ

9.1 出力ファイル

生成された動画は ComfyUI/output/ フォルダに保存されます。

9.2 高度なカスタマイズ

解像度: 720x480（デフォルト）から1280x720に変更可能
フレームレート: 8fps（デフォルト）から12fps or 24fpsに変更可能
持続時間: フレーム数を調整して長さを変更

まとめ

このセットアップにより、MacでCogVideoX-2Bモデルを使用して高品質な猫の動画を生成できます。初回実行時はモデルのダウンロードで時間がかかりますが、その後は比較的高速に動画生成が可能です。

次のステップ

異なるプロンプトで実験
CogVideoX-5Bモデルでより高品質な動画生成
Image-to-Video機能の探索
カスタムLoRAの訓練

CogVideoX

LTX Video

pip install -U git+https://github.com/huggingface/diffusers
pip install acceralate
pip install transformers[sentencepiece]
pip install sentencepiece

I2VGen-XL

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up