More than 3 years have passed since last update.

Deep-daze Imagineで（英文）「奥さんと一緒に庭を歩く大統領」の連想画像を生成した（V100 GPU@Colab+）

Last updated at 2021-08-22Posted at 2021-08-20

得られた結果

（入力した英文テキスト）

The U.S President walking in the garden of the White House with his wife

（出力された画像）

エポック数19回目（全20回）の中の300イテレーション目（全1050イテレーション）時点の出力画像

今回、得られた結果は以上です。
実行環境と実行方法と、動かしたモデルについて概略を述べます。

実行環境

Google Colab+ （定額月額 5,243円）
GPU: Tesla V100
Python: Python 3.7.11

実行コード

入力する文： The U.S President walking in the garden of the White House with his wife

pip install deep-daze
imagine "The U.S President walking in the garden of the White House with his wife" --num-layers 32

実行結果

実行中のカレント・ディレクトリに画像ファイルが出力される
Google Colab+を利用した今回は、ColabからマウントしたGoogle Driveのディレクトリに、画像ファイルが出力される

エポック数19回目（全20回）の中の300イテレーション目（全1050イテレーション）時点の出力画像

それより以前の出力画像

処理の開始直後に出力された画像

Deep-dazeとは？

英文を入力すると、その英文に対応する画像が出力される（文意から連想される画像）ツールです。
__pip install deep-daze__で入ります。

ソースコードも、GitHubで公開されている。

（GitHub） lucidrains/deep-daze

（実行されるのは、以下の部分）

github.com/lucidrains/deep-daze/blob/main/deep_daze/deep_daze.py

class Imagine(nn.Module):
    def __init__(
            self,
            *,
            text=None,
            img=None,
            clip_encoding=None,
            lr=1e-5,
            batch_size=4,
            gradient_accumulate_every=4,
            save_every=100,
            image_width=512,
            num_layers=16,
            epochs=20,
            iterations=1050,
            save_progress=True,
            seed=None,
            open_folder=True,
            save_date_time=False,
            start_image_path=None,
            start_image_train_iters=10,
            start_image_lr=3e-4,
            theta_initial=None,
            theta_hidden=None,
            model_name="ViT-B/32",
            lower_bound_cutout=0.1, # should be smaller than 0.8
            upper_bound_cutout=1.0,
            saturate_bound=False,
            averaging_weight=0.3,

            create_story=False,
            story_start_words=5,
            story_words_per_epoch=5,
            story_separator=None,
            gauss_sampling=False,
            gauss_mean=0.6,
            gauss_std=0.2,
            do_cutout=True,
            center_bias=False,
            center_focus=2,
            optimizer="AdamP",
            jit=True,
            hidden_size=256,
            save_gif=False,
            save_video=False,
    ):

Deep DazeのImagine__は。clipとSiren__の2つを使って動いているみたいです。

clip : OpenAIから公開されたツール。引数で渡した画像が、str型のテキストで渡した複数の単語のうち、どの単語に一番近いかのスコア値を返してくれる。Open AIから出た論文Learning Transferable Visual Models From Natural Language Supervisionで提案された。
Siren : 画像や動画、音声などのsignal dataの特徴を捉える精度に優れたニューラル・ネットワークモデル。スタンフォード大学から出たImplicit Neural Representations with Periodic Activation Functionsという表題の論文が初出。

__Sirenは、deep_daze.py__の__10行目__でimportされている。

from siren_pytorch import SirenNet, SirenWrapper

clip__は、deep_daze.py__の__23行目__でimportされている。

from .clip import load, tokenize

論文＆実装コード

論文と実装コードは次の通りです。

アルゴリズム名	論文	GitHUb	ポスター動画
*clip*	Learning Transferable Visual Models From Natural Language Supervision	openai/CLIP	Implicit Neural Representations with Periodic Activation Functions NeurIPS 2020 (Oral)
*Siren*	Implicit Neural Representations with Periodic Activation Functions	vsitzmann/siren	(NA)

（比較対象） Big sleep

DeepDaze__のImagine__について調べていると、複数のウェブサイトで、big sleep__というものとの比較に言及されている。big sleepも、テキストから（連想される）画像を出力するtext-to_image（text2image）__のひとつです。

今回は、この__Deep DazeのImageine__に英文を渡して、英文から連想される画像を生成させてみます。
なお、推論工程にあたるこの処理も、（モデルの学習工程のように）何回もepochとiterationを回して、少しずつ精度の良い画像を出力されます。（一定間隔ごとに、その時点で生成された画像が、カレント・ディレクトリにpngファイルで吐かれます）。

実行方法

以下の２つ

$ pip install deep-daze
$ imagine "a house in the forest"

from deep_daze import Imagine

imagine = Imagine(
    text = 'ca house in the forest',
    num_layers = 24,
)
imagine()

deep-daze Release 0.10.1

What is this?

Simple command line tool for text to image generation using OpenAI's CLIP and Siren. Credit goes to Ryan Murdock for the discovery of this technique (and for coming up with the great name)!

2021/8/20現在、 ver 0.10.2が出ている

Deep dazeの公式サイトを和訳した記事

Deep Daze の使い方

実行したコードと結果（全文）

Google Colabを立ち上げる

割り当てられたGPUを確認

gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Select the Runtime > "Change runtime type" menu to enable a GPU accelerator, ')
  print('and then re-execute this cell.')
else:
  print(gpu_info)

（実行結果）

__Tesla V100__が割り当てられた

Thu Aug 19 16:27:46 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P0    25W / 300W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Pythonのバージョンを確認

!python --version

（実行結果）

Python 3.7.11

Google Driveをマウントする

from google.colab import drive
drive.mount('/content/drive')

 !pwd

（実行結果）

/content

treeコマンドを入れる

!sudo apt-get install tree

（実行結果）

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-460
Use 'sudo apt autoremove' to remove it.
The following NEW packages will be installed:
  tree
0 upgraded, 1 newly installed, 0 to remove and 40 not upgraded.
Need to get 40.7 kB of archives.
After this operation, 105 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 tree amd64 1.7.0-5 [40.7 kB]
Fetched 40.7 kB in 0s (779 kB/s)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 1.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 
Selecting previously unselected package tree.
(Reading database ... 148486 files and directories currently installed.)
Preparing to unpack .../tree_1.7.0-5_amd64.deb ...
Unpacking tree (1.7.0-5) ...
Setting up tree (1.7.0-5) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...

!tree

（実行結果）

.
├── drive
│   └── MyDrive
│       └── google_colaboratory_share_folder
│           ├── A_naked_beautiful_girl_without_clothes_standing_in_fromt_of__the_hotel..000001.jpg
│           ├── A_naked_beautiful_girl_without_clothes_standing_in_fromt_of__the_hotel..jpg
│           └── nozomi
│               ├── cvt_nozomi_1.jpg
│               ├── mosaic_nozomi_1.jpg
│               ├── nozomi_1.jpg
│               ├── nozomi_2.jpg
│               ├── nozomi_3.jpg
│               ├── nozomi_4.jpg
│               ├── resized_nozomi_1.jpg
│               ├── resize_nozomi_1.jpg
│               └── trial.txt
└── sample_data
    ├── anscombe.json
    ├── california_housing_test.csv
    ├── california_housing_train.csv
    ├── mnist_test.csv
    ├── mnist_train_small.csv
    └── README.md

5 directories, 17 files

Google Driveのrootディレクトリに移動する。
content/drive/MyDriveという3つの階層のフォルダは、Google Drive側にはなく、Google Colaboratory（またはGoogle Colab, Colab+）側で、デフォルトで表示される階層に過ぎない。

%cd drive/MyDrive

（実行結果）

/content/drive/MyDrive

!ls

（実行結果）

google_colaboratory_share_folder

!mkdir google_colaboratory_share_folder_2

!tree

（実行結果）

.
├── google_colaboratory_share_folder
│   ├── A_naked_beautiful_girl_without_clothes_standing_in_fromt_of__the_hotel..000001.jpg
│   ├── A_naked_beautiful_girl_without_clothes_standing_in_fromt_of__the_hotel..000002.jpg
│   ├── A_naked_beautiful_girl_without_clothes_standing_in_fromt_of__the_hotel..jpg
│   └── nozomi
│       ├── cvt_nozomi_1.jpg
│       ├── mosaic_nozomi_1.jpg
│       ├── nozomi_1.jpg
│       ├── nozomi_2.jpg
│       ├── nozomi_3.jpg
│       ├── nozomi_4.jpg
│       ├── resized_nozomi_1.jpg
│       ├── resize_nozomi_1.jpg
│       └── trial.txt
└── google_colaboratory_share_folder_2

3 directories, 12 files

%cd google_colaboratory_share_folder_2

（実行結果）

/content/drive/My Drive/google_colaboratory_share_folder_2

!pwd

（実行結果）

/content/drive/My Drive/google_colaboratory_share_folder_2

pip install deep-daze

imagineは、Terminalに打つコマンドである。
コマンドなので、Colabでは先頭に「！」を付ける

!imagine "The U.S President walking in the garden of the White House with his wife" --num-layers 32

実行画面

セッションが切れないように、ローカルのMacbookから1時間置きにColab Jupyter notebookのURLにアクセスして画面を開く処理を定期実行させていたが、止まってしまった
エポック数19回目（全20回）の中の300イテレーション目（全1050イテレーション）まで実行された

mage updated at "./The_U.S_President_walking_in_the_garden_of_the_White_House_with_his_wife.000202.jpg"
epochs: 95% 19/20 [3:30:09<10:53, 653.70s/it]
。

    			（　省略　）
    
            "loss: -47.58:  99% 1043/1050 [10:47<00:04,  1.61it/s]\u001b[A\n",
            "loss: -50.82:  99% 1043/1050 [10:48<00:04,  1.61it/s]\u001b[A\n",
            "loss: -50.82:  99% 1044/1050 [10:48<00:03,  1.61it/s]\u001b[A\n",
            "loss: -48.14:  99% 1044/1050 [10:48<00:03,  1.61it/s]\u001b[A\n",
            "loss: -48.14: 100% 1045/1050 [10:48<00:03,  1.61it/s]\u001b[A\n",
            "loss: -49.14: 100% 1045/1050 [10:49<00:03,  1.61it/s]\u001b[A\n",
            "loss: -49.14: 100% 1046/1050 [10:49<00:02,  1.61it/s]\u001b[A\n",
            "loss: -50.57: 100% 1046/1050 [10:49<00:02,  1.61it/s]\u001b[A\n",
            "loss: -50.57: 100% 1047/1050 [10:49<00:01,  1.61it/s]\u001b[A\n",
            "loss: -49.27: 100% 1047/1050 [10:50<00:01,  1.61it/s]\u001b[A\n",
            "loss: -49.27: 100% 1048/1050 [10:50<00:01,  1.61it/s]\u001b[A\n",
            "loss: -48.08: 100% 1048/1050 [10:51<00:01,  1.61it/s]\u001b[A\n",
            "loss: -48.08: 100% 1049/1050 [10:51<00:00,  1.61it/s]\u001b[A\n",
            "loss: -47.26: 100% 1049/1050 [10:51<00:00,  1.61it/s]\u001b[A\n",
            "loss: -47.26: 100% 1050/1050 [10:51<00:00,  1.61it/s]\n",
            "epochs:  85% 17/20 [3:05:12<32:37, 652.52s/it]\n",
            "                                              \n",
            "\u001b[Aimage updated at \"./The_U.S_President_walking_in_the_garden_of_the_White_House_with_his_wife.000178.jpg\"\n",
            "epochs:  85% 17/20 [3:05:12<32:37, 652.52s/it]\n",
            "iteration:   0% 0/1050 [00:00<?, ?it/s]\u001b[A\n",
            "loss: -46.16:   0% 0/1050 [00:00<?, ?it/s]\u001b[A\n",
            "loss: -46.16:   0% 1/1050 [00:00<11:54,  1.47it/s]\u001b[A\n",
            "loss: -42.15:   0% 1/1050 [00:01<11:54,  1.47it/s]\u001b[A\n",
            "loss: -42.15:   0% 2/1050 [00:01<11:16,  1.55it/s]\u001b[A\n",
            "loss: -46.09:   0% 2/1050 [00:01<11:16,  1.55it/s]\u001b[A\n",
            "loss: -46.09:   0% 3/1050 [00:01<11:02,  1.58it/s]\u001b[A\n",
            "loss: -47.67:   0% 3/1050 [00:02<11:02,  1.58it/s]\u001b[A\n",
            "loss: -47.67:   0% 4/1050 [00:02<10:57,  1.59it/s]\u001b[A\n",
            "loss: -47.12:   0% 4/1050 [00:03<10:57,  1.59it/s]\u001b[A\n",
            "loss: -47.12:   0% 5/1050 [00:03<10:55,  1.59it/s]\u001b[A\n",
            "loss: -48.26:   0% 5/1050 [00:03<10:55,  1.59it/s]\u001b[A\n",
            "loss: -48.26:   1% 6/1050 [00:03<10:54,  1.60it/s]\u001b[A\n",
            "loss: -49.60:   1% 6/1050 [00:04<10:54,  1.60it/s]\u001b[A\n",
            "loss: -49.60:   1% 7/1050 [00:04<10:51,  1.60it/s]\u001b[A\n",
            "loss: -50.31:   1% 7/1050 [00:05<10:51,  1.60it/s]\u001b[A\n",
            "loss: -50.31:   1% 8/1050 [00:05<10:49,  1.60it/s]\u001b[A\n",
            "loss: -47.57:   1% 8/1050 [00:05<10:49,  1.60it/s]\u001b[A\n",
            "loss: -47.57:   1% 9/1050 [00:05<10:48,  1.60it/s]\u001b[A\n",

    			（　省略　）
    			
            "loss: -46.42:   9% 90/1050 [00:56<09:55,  1.61it/s]\u001b[A\n",
            "loss: -48.60:   9% 90/1050 [00:56<09:55,  1.61it/s]\u001b[A\n",
            "loss: -48.60:   9% 91/1050 [00:56<09:54,  1.61it/s]\u001b[A\n",
            "loss: -48.60:   9% 91/1050 [00:57<09:54,  1.61it/s]\u001b[A\n",
            "loss: -48.60:   9% 92/1050 [00:57<09:54,  1.61it/s]\u001b[A\n",
            "loss: -42.84:   9% 92/1050 [00:57<09:54,  1.61it/s]\u001b[A\n",
            "loss: -42.84:   9% 93/1050 [00:57<09:53,  1.61it/s]\u001b[A\n",
            "loss: -47.95:   9% 93/1050 [00:58<09:53,  1.61it/s]\u001b[A\n",
            "loss: -47.95:   9% 94/1050 [00:58<09:53,  1.61it/s]\u001b[A\n",
            "loss: -47.64:   9% 94/1050 [00:59<09:53,  1.61it/s]\u001b[A\n",
            "loss: -47.64:   9% 95/1050 [00:59<09:52,  1.61it/s]\u001b[A\n",
            "loss: -50.80:   9% 95/1050 [00:59<09:52,  1.61it/s]\u001b[A\n",
            "loss: -50.80:   9% 96/1050 [00:59<09:51,  1.61it/s]\u001b[A\n",
            "loss: -45.83:   9% 96/1050 [01:00<09:51,  1.61it/s]\u001b[A\n",
            "loss: -45.83:   9% 97/1050 [01:00<09:51,  1.61it/s]\u001b[A\n",
            "loss: -50.28:   9% 97/1050 [01:00<09:51,  1.61it/s]\u001b[A\n",
            "loss: -50.28:   9% 98/1050 [01:00<09:51,  1.61it/s]\u001b[A\n",
            "loss: -48.49:   9% 98/1050 [01:01<09:51,  1.61it/s]\u001b[A\n",
            "loss: -48.49:   9% 99/1050 [01:01<09:52,  1.61it/s]\u001b[A\n",
            "loss: -47.48:   9% 99/1050 [01:02<09:52,  1.61it/s]\u001b[A\n",
            "                                              \n",
            "\u001b[Aimage updated at \"./The_U.S_President_walking_in_the_garden_of_the_White_House_with_his_wife.000179.jpg\"\n",
            "epochs:  85% 17/20 [3:06:15<32:37, 652.52s/it]\n",
            "loss: -47.48:  10% 100/1050 [01:02<09:51,  1.61it/s]\u001b[A\n",
            "loss: -43.83:  10% 100/1050 [01:02<09:51,  1.61it/s]\u001b[A\n",
            "loss: -43.83:  10% 101/1050 [01:02<10:04,  1.57it/s]\u001b[A\n",
            "loss: -48.02:  10% 101/1050 [01:03<10:04,  1.57it/s]\u001b[A\n",
            "loss: -48.02:  10% 102/1050 [01:03<09:59,  1.58it/s]\u001b[A\n",

ローカルのTeminalから、ColabのJupyter notebookサイトへの定時アクセス

「（削除）」部分は、この記事にコードを貼るにあたり、マスキングで削除しました。

Terminal（ローカル端末）

electron@diynoMacBook-Pro ~ % vi ./access_colab.sh
electron@diynoMacBook-Pro ~ % cat ./access_colab.sh
# !/bin/bash

for i in `seq 0 12`
do
  echo "[$i]" ` date '+%y/%m/%d %H:%M:%S'` "connected."
  open "https://colab.research.google.com/drive/18nRt（削除）umgd_sSwOauE"
  sleep 1800
done
electron@diynoMacBook-Pro ~ % ./access_colab.sh   
[0] 21/08/20 12:58:35 connected.
[1] 21/08/20 13:33:47 connected.

画像ファイルが大量に出力される

Google Colab+を利用した今回は、ColabからマウントしたGoogle Driveのディレクトリに、画像ファイルが出力される

数時間後に見たら、切れていた

            "loss: -51.63:  10% 102/1050 [01:04<09:59,  1.58it/s]\u001b[A\n",
            "loss: -51.63:  10% 103/1050 [01:04<09:55,  1.59it/s]\u001b[A\n",

    			（　省略　）

               "loss: -45.88:  28% 298/1050 [03:06<07:49,  1.60it/s]\u001b[A\n",
            "loss: -45.88:  28% 299/1050 [03:06<07:49,  1.60it/s]\u001b[A\n",
            "loss: -49.29:  28% 299/1050 [03:07<07:49,  1.60it/s]\u001b[A\n",
            "                                              \n",
            "\u001b[Aimage updated at \"./The_U.S_President_walking_in_the_garden_of_the_White_House_with_his_wife.000202.jpg\"\n",
            "epochs:  95% 19/20 [3:30:09<10:53, 653.70s/it]\n",
            "loss: -49.29:  29% 300/1050 [03:07<07:48,  1.60it/s]\u001b[A\n",
            "loss: -43.48:  29% 300/1050 [03:07<07:48,  1.60it/s]\u001b[A\n",
            "loss: -43.48:  29% 301/1050 [03:07<08:00,  1.56it/s]\u001b[A\n",
            "loss: -47.35:  29% 301/1050 [03:08<08:00,  1.56it/s]\u001b[A\n",
            "loss: -47.35:  29% 302/1050 [03:08<07:55,  1.57it/s]\u001b[A\n",
            "loss: -48.24:  29% 302/1050 [03:09<07:55,  1.57it/s]\u001b[A\n",
            "loss: -48.24:  29% 303/1050 [03:09<07:51,  1.58it/s]\u001b[A\n",
            "loss: -47.17:  29% 303/1050 [03:09<07:51,  1.58it/s]\u001b[A\n",
            "loss: -47.17:  29% 304/1050 [03:09<07:49,  1.59it/s]\u001b[A\n",
            "loss: -43.97:  29% 304/1050 [03:10<07:49,  1.59it/s]\u001b[A\n",
            "loss: -43.97:  29% 305/1050 [03:10<07:48,  1.59it/s]\u001b[A\n",
            "loss: -46.13:  29% 305/1050 [03:10<07:48,  1.59it/s]\u001b[A\n",
            "loss: -46.13:  29% 306/1050 [03:10<07:46,  1.59it/s]\u001b[A\n",
            "loss: -44.92:  29% 306/1050 [03:11<07:46,  1.59it/s]\u001b[A\n",
            "loss: -44.92:  29% 307/1050 [03:11<07:44,  1.60it/s]\u001b[A\n",
            "loss: -43.60:  29% 307/1050 [03:12<07:44,  1.60it/s]\u001b[A\n",
            "loss: -43.60:  29% 308/1050 [03:12<07:42,  1.60it/s]\u001b[A\n",
            "loss: -46.96:  29% 308/1050 [03:12<07:42,  1.60it/s]\u001b[A\n",
            "loss: -46.96:  29% 309/1050 [03:12<07:41,  1.60it/s]\u001b[A\n",
            "loss: -49.73:  29% 309/1050 [03:13<07:41,  1.60it/s]\u001b[A\n",
            "loss: -49.73:  30% 310/1050 [03:13<07:40,  1.61it/s]\u001b[A"

gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Select the Runtime > "Change runtime type" menu to enable a GPU accelerator, ')
  print('and then re-execute this cell.')
else:
  print(gpu_info)

（実行結果）

小画面

セッションが多すぎます
アクティブなセッションが多すぎるため、新しいセッションを作成できません。新しいセッションを作成するには、既存のセッションを終了してください。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Deep-daze Imagineで（英文）「奥さんと一緒に庭を歩く大統領」の連想画像を生成した（V100 GPU@Colab+）

得られた結果

実行環境

実行コード

実行結果

それより以前の出力画像

処理の開始直後に出力された画像

Deep-dazeとは？

論文＆実装コード

（比較対象） Big sleep

実行方法

実行したコードと結果 （全文）

Google Colabを立ち上げる

割り当てられたGPUを確認

Pythonのバージョンを確認

Google Driveをマウントする

実行画面

ローカルのTeminalから、ColabのJupyter notebookサイトへの定時アクセス

画像ファイルが大量に出力される

数時間後に見たら、切れていた

実行したコードと結果（全文）