ComfyUI + Mochi 1 をローカルで動かしてみた

Last updated at 2025-01-26Posted at 2025-01-16

ComfyUI + Mochi 1

Mochi 1 を使ってみたので, 手順と感想など

動作環境

OS: Ubuntu Server 22.04.5 LTS
- NVIDIA ドライバ: nvidia-driver-565-server
CPU: AMD Ryzen 3700X
RAM: DDR4 64GB
GPU: NVIDIA RTX 3090 FE

環境構築

参考文献: https://highreso.jp/edgehub/moviegenerationai/mochi1.html

ComfyUI のインストール

公式ドキュメント: https://github.com/comfyanonymous/ComfyUI

リポジトリを clone

git clone https://github.com/comfyanonymous/ComfyUI.git

pip インストール

pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt

# 注: Mochi のカスタムノード import に失敗したため, 以下を追加
pip install opencv-python imageio-ffmpeg

Mochi のカスタムノードをインストール

cd /path/to/ComfyUI/custom_nodes/
git clone https://github.com/kijai/ComfyUI-MochiWrapper.git
git clone https://github.com/kijai/ComfyUI-KJNodes.git
git clone https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite.git

適当な CLIP ファイルをダウンロードして配置

cd /path/to/ComfyUI/models/clip/
wget https://huggingface.co/comfyanonymous/flux_text_encoders/blob/main/t5xxl_fp8_e4m3fn.safetensors

Mochi-1 のワークフローファイル (json) をダウンロードして, 適当な場所に置いておく (後で WebUI を通じてアップロードする)

https://openart.ai/workflows/datou/mochi-1-t2v/IL31UAdoK45d4VBi43F9
ComfyUI 起動
```
python main.py --listen
```
ワークフローファイルをドラッグ&ドロップ
CLIP Node の clip_name をクリックして t5xxl_fp8_e4m3fn.safetensors を指定
実行



以下のエラーが出た.
```
TypeError: object of type 'float' has no len()
```
Issue によると MochiSampler Node を作りなおす必要があるとのこと.

(何もないところで右クリック) > Add Node > MochiWrapper > Mochi Sampler で新しく Node を作成し, 既存のものと同じ構造になるようにエッジをつなぎなおす.

※最終的な Workflow は以下のようになった.

実行結果

一つの動画を生成するのにおよそ 10 分程度かかった.

※GIF アニメーション画像に変換している過程で, 色が 256 色になってしまっています.
(Qiita mp4 サポートしてくれ…)

Example 1

A cute girl is dancing

Example 2

A woman is singing

Example 3

A site engineer monitoring different aspects of a construction project

Example 4

A voyager conveying his canoe from the Athabasca river bank to Ottawa in Canada

Example 5

A cat and a dog fighting outdoors.

Example 6

A serene, mystical landscape of a bygone era, where towering, ancient trees stretch towards the sky, their canopies a vibrant tapestry of greens, sheltering a vast, sun-drenched plain teeming with life. In the foreground, a gentle Diplodocus roams, its long neck bending as it grazes on the lush, emerald grass, its massive body a testament to the wonders of a long-lost world. Nearby, a pair of majestic Dynasaries, their scales glinting in the soft, ethereal light, move quietly, their eyes watchful, yet peaceful, as they too feed on the abundant flora. The air is alive with the songs of ancient birds, and the sweet scent of blooming wildflowers wafts on the breeze, transporting us to a time when magic was woven into the very fabric of existence.

感想

AnimateDiff と比べて, 挙動がリアルに (体の動きに対する服や髪の動きなど)
動画の途中で人物がだんだんかわってしまう問題も解決している!?
欠点は生成に少々時間がかかること.

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up