LoginSignup
1
1

More than 1 year has passed since last update.

Deep-daze Imagineで(英文)「奥さんと一緒に庭を歩く大統領」の連想画像を生成した(V100 GPU@Colab+)

Last updated at Posted at 2021-08-20

得られた結果

( 入力した英文テキスト )

The U.S President walking in the garden of the White House with his wife

( 出力された画像 )

  • エポック数19回目(全20回)の中の300イテレーション目(全1050イテレーション)時点の出力画像

The_U S_President_walking_in_the_garden_of_the_White_House_with_his_wife

今回、得られた結果は以上です。
実行環境と実行方法と、動かしたモデルについて概略を述べます。


実行環境

  • Google Colab+ (定額月額 5,243円)
  • GPU: Tesla V100
  • Python: Python 3.7.11

実行コード

  • 入力する文: The U.S President walking in the garden of the White House with his wife
pip install deep-daze
imagine "The U.S President walking in the garden of the White House with his wife" --num-layers 32

実行結果

  • 実行中のカレント・ディレクトリに画像ファイルが出力される
  • Google Colab+を利用した今回は、ColabからマウントしたGoogle Driveのディレクトリに、画像ファイルが出力される

スクリーンショット 2021-08-20 13 52 16

  • エポック数19回目(全20回)の中の300イテレーション目(全1050イテレーション)時点の出力画像

The_U S_President_walking_in_the_garden_of_the_White_House_with_his_wife

それより以前の出力画像

The_U S_President_walking_in_the_garden_of_the_White_House_with_his_wife 000050

処理の開始直後に出力された画像

The_U S_President_walking_in_the_garden_of_the_White_House_with_his_wife 000001 (1)

Deep-dazeとは?

英文を入力すると、その英文に対応する画像が出力される (文意から連想される画像)ツールです。
pip install deep-dazeで入ります。

ソースコードも、GitHubで公開されている。

( 実行されるのは、以下の部分 )

github.com/lucidrains/deep-daze/blob/main/deep_daze/deep_daze.py
class Imagine(nn.Module):
    def __init__(
            self,
            *,
            text=None,
            img=None,
            clip_encoding=None,
            lr=1e-5,
            batch_size=4,
            gradient_accumulate_every=4,
            save_every=100,
            image_width=512,
            num_layers=16,
            epochs=20,
            iterations=1050,
            save_progress=True,
            seed=None,
            open_folder=True,
            save_date_time=False,
            start_image_path=None,
            start_image_train_iters=10,
            start_image_lr=3e-4,
            theta_initial=None,
            theta_hidden=None,
            model_name="ViT-B/32",
            lower_bound_cutout=0.1, # should be smaller than 0.8
            upper_bound_cutout=1.0,
            saturate_bound=False,
            averaging_weight=0.3,

            create_story=False,
            story_start_words=5,
            story_words_per_epoch=5,
            story_separator=None,
            gauss_sampling=False,
            gauss_mean=0.6,
            gauss_std=0.2,
            do_cutout=True,
            center_bias=False,
            center_focus=2,
            optimizer="AdamP",
            jit=True,
            hidden_size=256,
            save_gif=False,
            save_video=False,
    ):

Deep DazeのImagineは。clipSirenの2つを使って動いているみたいです。

  • clip : OpenAIから公開されたツール。引数で渡した画像が、str型のテキストで渡した複数の単語のうち、どの単語に一番近いかのスコア値を返してくれる。Open AIから出た論文Learning Transferable Visual Models From Natural Language Supervisionで提案された。

  • Siren : 画像や動画、音声などのsignal dataの特徴を捉える精度に優れたニューラル・ネットワークモデル。スタンフォード大学から出たImplicit Neural Representations with Periodic Activation Functionsという表題の論文が初出。

Sirenは、deep_daze.py10行目でimportされている。

from siren_pytorch import SirenNet, SirenWrapper

clipは、deep_daze.py23行目でimportされている。

from .clip import load, tokenize

論文&実装コード

論文と実装コードは次の通りです。

アルゴリズム名 論文 GitHUb ポスター動画
clip Learning Transferable Visual Models From Natural Language Supervision openai/CLIP Implicit Neural Representations with Periodic Activation Functions NeurIPS 2020 (Oral)
Siren Implicit Neural Representations with Periodic Activation Functions vsitzmann/siren (NA)

(比較対象) Big sleep

DeepDazeImagineについて調べていると、複数のウェブサイトで、big sleepというものとの比較に言及されている。big sleepも、テキストから(連想される)画像を出力するtext-to_imagetext2imageのひとつです。

今回は、このDeep DazeのImageineに英文を渡して、英文から連想される画像を生成させてみます。
なお、推論工程にあたるこの処理も、(モデルの学習工程のように)何回もepochとiterationを回して、少しずつ精度の良い画像を出力されます。(一定間隔ごとに、その時点で生成された画像が、カレント・ディレクトリにpngファイルで吐かれます)。

実行方法

以下の2つ

$ pip install deep-daze
$ imagine "a house in the forest"
from deep_daze import Imagine

imagine = Imagine(
    text = 'ca house in the forest',
    num_layers = 24,
)
imagine()

What is this?

Simple command line tool for text to image generation using OpenAI's CLIP and Siren. Credit goes to Ryan Murdock for the discovery of this technique (and for coming up with the great name)!

2021/8/20現在、 ver 0.10.2が出ている

Deep dazeの公式サイトを和訳した記事

実行したコードと結果 (全文)

Google Colabを立ち上げる

スクリーンショット 2021-08-20 14 01 08

割り当てられたGPUを確認

gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Select the Runtime > "Change runtime type" menu to enable a GPU accelerator, ')
  print('and then re-execute this cell.')
else:
  print(gpu_info)

( 実行結果 )

  • Tesla V100が割り当てられた
Thu Aug 19 16:27:46 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P0    25W / 300W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Pythonのバージョンを確認

!python --version

( 実行結果 )

Python 3.7.11

Google Driveをマウントする

from google.colab import drive
drive.mount('/content/drive')
 !pwd

( 実行結果 )

/content
  • treeコマンドを入れる
!sudo apt-get install tree

( 実行結果 )

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-460
Use 'sudo apt autoremove' to remove it.
The following NEW packages will be installed:
  tree
0 upgraded, 1 newly installed, 0 to remove and 40 not upgraded.
Need to get 40.7 kB of archives.
After this operation, 105 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 tree amd64 1.7.0-5 [40.7 kB]
Fetched 40.7 kB in 0s (779 kB/s)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 1.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 
Selecting previously unselected package tree.
(Reading database ... 148486 files and directories currently installed.)
Preparing to unpack .../tree_1.7.0-5_amd64.deb ...
Unpacking tree (1.7.0-5) ...
Setting up tree (1.7.0-5) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
!tree

( 実行結果 )

.
├── drive
│   └── MyDrive
│       └── google_colaboratory_share_folder
│           ├── A_naked_beautiful_girl_without_clothes_standing_in_fromt_of__the_hotel..000001.jpg
│           ├── A_naked_beautiful_girl_without_clothes_standing_in_fromt_of__the_hotel..jpg
│           └── nozomi
│               ├── cvt_nozomi_1.jpg
│               ├── mosaic_nozomi_1.jpg
│               ├── nozomi_1.jpg
│               ├── nozomi_2.jpg
│               ├── nozomi_3.jpg
│               ├── nozomi_4.jpg
│               ├── resized_nozomi_1.jpg
│               ├── resize_nozomi_1.jpg
│               └── trial.txt
└── sample_data
    ├── anscombe.json
    ├── california_housing_test.csv
    ├── california_housing_train.csv
    ├── mnist_test.csv
    ├── mnist_train_small.csv
    └── README.md

5 directories, 17 files
  • Google Driveのrootディレクトリに移動する。
  • content/drive/MyDriveという3つの階層のフォルダは、Google Drive側にはなく、Google Colaboratory(またはGoogle Colab, Colab+)側で、デフォルトで表示される階層に過ぎない。
%cd drive/MyDrive

( 実行結果 )

/content/drive/MyDrive
!ls

( 実行結果 )

google_colaboratory_share_folder
!mkdir google_colaboratory_share_folder_2
!tree

( 実行結果 )

.
├── google_colaboratory_share_folder
│   ├── A_naked_beautiful_girl_without_clothes_standing_in_fromt_of__the_hotel..000001.jpg
│   ├── A_naked_beautiful_girl_without_clothes_standing_in_fromt_of__the_hotel..000002.jpg
│   ├── A_naked_beautiful_girl_without_clothes_standing_in_fromt_of__the_hotel..jpg
│   └── nozomi
│       ├── cvt_nozomi_1.jpg
│       ├── mosaic_nozomi_1.jpg
│       ├── nozomi_1.jpg
│       ├── nozomi_2.jpg
│       ├── nozomi_3.jpg
│       ├── nozomi_4.jpg
│       ├── resized_nozomi_1.jpg
│       ├── resize_nozomi_1.jpg
│       └── trial.txt
└── google_colaboratory_share_folder_2

3 directories, 12 files
%cd google_colaboratory_share_folder_2

( 実行結果 )

/content/drive/My Drive/google_colaboratory_share_folder_2
!pwd

( 実行結果 )

/content/drive/My Drive/google_colaboratory_share_folder_2
pip install deep-daze
  • imagineは、Terminalに打つコマンドである。
  • コマンドなので、Colabでは先頭に「!」を付ける
!imagine "The U.S President walking in the garden of the White House with his wife" --num-layers 32

実行画面

  • セッションが切れないように、ローカルのMacbookから1時間置きにColab Jupyter notebookのURLにアクセスして画面を開く処理を定期実行させていたが、止まってしまった

  • エポック数19回目(全20回)の中の300イテレーション目(全1050イテレーション)まで実行された

mage updated at "./The_U.S_President_walking_in_the_garden_of_the_White_House_with_his_wife.000202.jpg"
epochs: 95% 19/20 [3:30:09<10:53, 653.70s/it]

                ( 省略 )

            "loss: -47.58:  99% 1043/1050 [10:47<00:04,  1.61it/s]\u001b[A\n",
            "loss: -50.82:  99% 1043/1050 [10:48<00:04,  1.61it/s]\u001b[A\n",
            "loss: -50.82:  99% 1044/1050 [10:48<00:03,  1.61it/s]\u001b[A\n",
            "loss: -48.14:  99% 1044/1050 [10:48<00:03,  1.61it/s]\u001b[A\n",
            "loss: -48.14: 100% 1045/1050 [10:48<00:03,  1.61it/s]\u001b[A\n",
            "loss: -49.14: 100% 1045/1050 [10:49<00:03,  1.61it/s]\u001b[A\n",
            "loss: -49.14: 100% 1046/1050 [10:49<00:02,  1.61it/s]\u001b[A\n",
            "loss: -50.57: 100% 1046/1050 [10:49<00:02,  1.61it/s]\u001b[A\n",
            "loss: -50.57: 100% 1047/1050 [10:49<00:01,  1.61it/s]\u001b[A\n",
            "loss: -49.27: 100% 1047/1050 [10:50<00:01,  1.61it/s]\u001b[A\n",
            "loss: -49.27: 100% 1048/1050 [10:50<00:01,  1.61it/s]\u001b[A\n",
            "loss: -48.08: 100% 1048/1050 [10:51<00:01,  1.61it/s]\u001b[A\n",
            "loss: -48.08: 100% 1049/1050 [10:51<00:00,  1.61it/s]\u001b[A\n",
            "loss: -47.26: 100% 1049/1050 [10:51<00:00,  1.61it/s]\u001b[A\n",
            "loss: -47.26: 100% 1050/1050 [10:51<00:00,  1.61it/s]\n",
            "epochs:  85% 17/20 [3:05:12<32:37, 652.52s/it]\n",
            "                                              \n",
            "\u001b[Aimage updated at \"./The_U.S_President_walking_in_the_garden_of_the_White_House_with_his_wife.000178.jpg\"\n",
            "epochs:  85% 17/20 [3:05:12<32:37, 652.52s/it]\n",
            "iteration:   0% 0/1050 [00:00<?, ?it/s]\u001b[A\n",
            "loss: -46.16:   0% 0/1050 [00:00<?, ?it/s]\u001b[A\n",
            "loss: -46.16:   0% 1/1050 [00:00<11:54,  1.47it/s]\u001b[A\n",
            "loss: -42.15:   0% 1/1050 [00:01<11:54,  1.47it/s]\u001b[A\n",
            "loss: -42.15:   0% 2/1050 [00:01<11:16,  1.55it/s]\u001b[A\n",
            "loss: -46.09:   0% 2/1050 [00:01<11:16,  1.55it/s]\u001b[A\n",
            "loss: -46.09:   0% 3/1050 [00:01<11:02,  1.58it/s]\u001b[A\n",
            "loss: -47.67:   0% 3/1050 [00:02<11:02,  1.58it/s]\u001b[A\n",
            "loss: -47.67:   0% 4/1050 [00:02<10:57,  1.59it/s]\u001b[A\n",
            "loss: -47.12:   0% 4/1050 [00:03<10:57,  1.59it/s]\u001b[A\n",
            "loss: -47.12:   0% 5/1050 [00:03<10:55,  1.59it/s]\u001b[A\n",
            "loss: -48.26:   0% 5/1050 [00:03<10:55,  1.59it/s]\u001b[A\n",
            "loss: -48.26:   1% 6/1050 [00:03<10:54,  1.60it/s]\u001b[A\n",
            "loss: -49.60:   1% 6/1050 [00:04<10:54,  1.60it/s]\u001b[A\n",
            "loss: -49.60:   1% 7/1050 [00:04<10:51,  1.60it/s]\u001b[A\n",
            "loss: -50.31:   1% 7/1050 [00:05<10:51,  1.60it/s]\u001b[A\n",
            "loss: -50.31:   1% 8/1050 [00:05<10:49,  1.60it/s]\u001b[A\n",
            "loss: -47.57:   1% 8/1050 [00:05<10:49,  1.60it/s]\u001b[A\n",
            "loss: -47.57:   1% 9/1050 [00:05<10:48,  1.60it/s]\u001b[A\n",

                ( 省略 )

            "loss: -46.42:   9% 90/1050 [00:56<09:55,  1.61it/s]\u001b[A\n",
            "loss: -48.60:   9% 90/1050 [00:56<09:55,  1.61it/s]\u001b[A\n",
            "loss: -48.60:   9% 91/1050 [00:56<09:54,  1.61it/s]\u001b[A\n",
            "loss: -48.60:   9% 91/1050 [00:57<09:54,  1.61it/s]\u001b[A\n",
            "loss: -48.60:   9% 92/1050 [00:57<09:54,  1.61it/s]\u001b[A\n",
            "loss: -42.84:   9% 92/1050 [00:57<09:54,  1.61it/s]\u001b[A\n",
            "loss: -42.84:   9% 93/1050 [00:57<09:53,  1.61it/s]\u001b[A\n",
            "loss: -47.95:   9% 93/1050 [00:58<09:53,  1.61it/s]\u001b[A\n",
            "loss: -47.95:   9% 94/1050 [00:58<09:53,  1.61it/s]\u001b[A\n",
            "loss: -47.64:   9% 94/1050 [00:59<09:53,  1.61it/s]\u001b[A\n",
            "loss: -47.64:   9% 95/1050 [00:59<09:52,  1.61it/s]\u001b[A\n",
            "loss: -50.80:   9% 95/1050 [00:59<09:52,  1.61it/s]\u001b[A\n",
            "loss: -50.80:   9% 96/1050 [00:59<09:51,  1.61it/s]\u001b[A\n",
            "loss: -45.83:   9% 96/1050 [01:00<09:51,  1.61it/s]\u001b[A\n",
            "loss: -45.83:   9% 97/1050 [01:00<09:51,  1.61it/s]\u001b[A\n",
            "loss: -50.28:   9% 97/1050 [01:00<09:51,  1.61it/s]\u001b[A\n",
            "loss: -50.28:   9% 98/1050 [01:00<09:51,  1.61it/s]\u001b[A\n",
            "loss: -48.49:   9% 98/1050 [01:01<09:51,  1.61it/s]\u001b[A\n",
            "loss: -48.49:   9% 99/1050 [01:01<09:52,  1.61it/s]\u001b[A\n",
            "loss: -47.48:   9% 99/1050 [01:02<09:52,  1.61it/s]\u001b[A\n",
            "                                              \n",
            "\u001b[Aimage updated at \"./The_U.S_President_walking_in_the_garden_of_the_White_House_with_his_wife.000179.jpg\"\n",
            "epochs:  85% 17/20 [3:06:15<32:37, 652.52s/it]\n",
            "loss: -47.48:  10% 100/1050 [01:02<09:51,  1.61it/s]\u001b[A\n",
            "loss: -43.83:  10% 100/1050 [01:02<09:51,  1.61it/s]\u001b[A\n",
            "loss: -43.83:  10% 101/1050 [01:02<10:04,  1.57it/s]\u001b[A\n",
            "loss: -48.02:  10% 101/1050 [01:03<10:04,  1.57it/s]\u001b[A\n",
            "loss: -48.02:  10% 102/1050 [01:03<09:59,  1.58it/s]\u001b[A\n",

ローカルのTeminalから、ColabのJupyter notebookサイトへの定時アクセス

  • 「(削除)」部分は、この記事にコードを貼るにあたり、マスキングで削除しました。
Terminal(ローカル端末)
electron@diynoMacBook-Pro ~ % vi ./access_colab.sh
electron@diynoMacBook-Pro ~ % cat ./access_colab.sh
#!/bin/bash

for i in `seq 0 12`
do
  echo "[$i]" ` date '+%y/%m/%d %H:%M:%S'` "connected."
  open "https://colab.research.google.com/drive/18nRt(削除)umgd_sSwOauE"
  sleep 1800
done
electron@diynoMacBook-Pro ~ % ./access_colab.sh   
[0] 21/08/20 12:58:35 connected.
[1] 21/08/20 13:33:47 connected.

画像ファイルが大量に出力される

  • Google Colab+を利用した今回は、ColabからマウントしたGoogle Driveのディレクトリに、画像ファイルが出力される

スクリーンショット 2021-08-20 13 52 16

スクリーンショット 2021-08-20 13 52 03

数時間後に見たら、切れていた

            "loss: -51.63:  10% 102/1050 [01:04<09:59,  1.58it/s]\u001b[A\n",
            "loss: -51.63:  10% 103/1050 [01:04<09:55,  1.59it/s]\u001b[A\n",

                ( 省略 )

               "loss: -45.88:  28% 298/1050 [03:06<07:49,  1.60it/s]\u001b[A\n",
            "loss: -45.88:  28% 299/1050 [03:06<07:49,  1.60it/s]\u001b[A\n",
            "loss: -49.29:  28% 299/1050 [03:07<07:49,  1.60it/s]\u001b[A\n",
            "                                              \n",
            "\u001b[Aimage updated at \"./The_U.S_President_walking_in_the_garden_of_the_White_House_with_his_wife.000202.jpg\"\n",
            "epochs:  95% 19/20 [3:30:09<10:53, 653.70s/it]\n",
            "loss: -49.29:  29% 300/1050 [03:07<07:48,  1.60it/s]\u001b[A\n",
            "loss: -43.48:  29% 300/1050 [03:07<07:48,  1.60it/s]\u001b[A\n",
            "loss: -43.48:  29% 301/1050 [03:07<08:00,  1.56it/s]\u001b[A\n",
            "loss: -47.35:  29% 301/1050 [03:08<08:00,  1.56it/s]\u001b[A\n",
            "loss: -47.35:  29% 302/1050 [03:08<07:55,  1.57it/s]\u001b[A\n",
            "loss: -48.24:  29% 302/1050 [03:09<07:55,  1.57it/s]\u001b[A\n",
            "loss: -48.24:  29% 303/1050 [03:09<07:51,  1.58it/s]\u001b[A\n",
            "loss: -47.17:  29% 303/1050 [03:09<07:51,  1.58it/s]\u001b[A\n",
            "loss: -47.17:  29% 304/1050 [03:09<07:49,  1.59it/s]\u001b[A\n",
            "loss: -43.97:  29% 304/1050 [03:10<07:49,  1.59it/s]\u001b[A\n",
            "loss: -43.97:  29% 305/1050 [03:10<07:48,  1.59it/s]\u001b[A\n",
            "loss: -46.13:  29% 305/1050 [03:10<07:48,  1.59it/s]\u001b[A\n",
            "loss: -46.13:  29% 306/1050 [03:10<07:46,  1.59it/s]\u001b[A\n",
            "loss: -44.92:  29% 306/1050 [03:11<07:46,  1.59it/s]\u001b[A\n",
            "loss: -44.92:  29% 307/1050 [03:11<07:44,  1.60it/s]\u001b[A\n",
            "loss: -43.60:  29% 307/1050 [03:12<07:44,  1.60it/s]\u001b[A\n",
            "loss: -43.60:  29% 308/1050 [03:12<07:42,  1.60it/s]\u001b[A\n",
            "loss: -46.96:  29% 308/1050 [03:12<07:42,  1.60it/s]\u001b[A\n",
            "loss: -46.96:  29% 309/1050 [03:12<07:41,  1.60it/s]\u001b[A\n",
            "loss: -49.73:  29% 309/1050 [03:13<07:41,  1.60it/s]\u001b[A\n",
            "loss: -49.73:  30% 310/1050 [03:13<07:40,  1.61it/s]\u001b[A"
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Select the Runtime > "Change runtime type" menu to enable a GPU accelerator, ')
  print('and then re-execute this cell.')
else:
  print(gpu_info)

( 実行結果 )

小画面
セッションが多すぎます
アクティブなセッションが多すぎるため、新しいセッションを作成できません。新しいセッションを作成するには、既存のセッションを終了してください。

スクリーンショット 2021-08-20 13 50 48

1
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
1