得られた結果
( 入力した英文テキスト )
The U.S President walking in the garden of the White House with his wife
( 出力された画像 )
- エポック数19回目(全20回)の中の300イテレーション目(全1050イテレーション)時点の出力画像
今回、得られた結果は以上です。
実行環境と実行方法と、動かしたモデルについて概略を述べます。
実行環境
- Google Colab+ (定額月額 5,243円)
- GPU: Tesla V100
- Python: Python 3.7.11
実行コード
- 入力する文: The U.S President walking in the garden of the White House with his wife
pip install deep-daze
imagine "The U.S President walking in the garden of the White House with his wife" --num-layers 32
実行結果
- 実行中のカレント・ディレクトリに画像ファイルが出力される
- Google Colab+を利用した今回は、ColabからマウントしたGoogle Driveのディレクトリに、画像ファイルが出力される

- エポック数19回目(全20回)の中の300イテレーション目(全1050イテレーション)時点の出力画像
それより以前の出力画像
処理の開始直後に出力された画像
Deep-dazeとは?
英文を入力すると、その英文に対応する画像が出力される (文意から連想される画像)ツールです。
__pip install deep-daze__で入ります。
ソースコードも、GitHubで公開されている。
( 実行されるのは、以下の部分 )
class Imagine(nn.Module):
def __init__(
self,
*,
text=None,
img=None,
clip_encoding=None,
lr=1e-5,
batch_size=4,
gradient_accumulate_every=4,
save_every=100,
image_width=512,
num_layers=16,
epochs=20,
iterations=1050,
save_progress=True,
seed=None,
open_folder=True,
save_date_time=False,
start_image_path=None,
start_image_train_iters=10,
start_image_lr=3e-4,
theta_initial=None,
theta_hidden=None,
model_name="ViT-B/32",
lower_bound_cutout=0.1, # should be smaller than 0.8
upper_bound_cutout=1.0,
saturate_bound=False,
averaging_weight=0.3,
create_story=False,
story_start_words=5,
story_words_per_epoch=5,
story_separator=None,
gauss_sampling=False,
gauss_mean=0.6,
gauss_std=0.2,
do_cutout=True,
center_bias=False,
center_focus=2,
optimizer="AdamP",
jit=True,
hidden_size=256,
save_gif=False,
save_video=False,
):
Deep DazeのImagine__は。clipとSiren__の2つを使って動いているみたいです。
-
clip : OpenAIから公開されたツール。引数で渡した画像が、str型のテキストで渡した複数の単語のうち、どの単語に一番近いかのスコア値を返してくれる。Open AIから出た論文Learning Transferable Visual Models From Natural Language Supervisionで提案された。
-
Siren : 画像や動画、音声などのsignal dataの特徴を捉える精度に優れたニューラル・ネットワークモデル。スタンフォード大学から出たImplicit Neural Representations with Periodic Activation Functionsという表題の論文が初出。
__Sirenは、deep_daze.py__の__10行目__でimportされている。
from siren_pytorch import SirenNet, SirenWrapper
clip__は、deep_daze.py__の__23行目__でimportされている。
from .clip import load, tokenize
論文&実装コード
論文と実装コードは次の通りです。
(比較対象) Big sleep
DeepDaze__のImagine__について調べていると、複数のウェブサイトで、big sleep__というものとの比較に言及されている。big sleepも、テキストから(連想される)画像を出力するtext-to_image(text2image)__のひとつです。
- (GitHub)lucidrains/big-sleep
- A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN
- Big Sleep の使い方
- BigGAN+CLIPで、テキストから画像を生成する
今回は、この__Deep DazeのImageine__に英文を渡して、英文から連想される画像を生成させてみます。
なお、推論工程にあたるこの処理も、(モデルの学習工程のように)何回もepochとiterationを回して、少しずつ精度の良い画像を出力されます。(一定間隔ごとに、その時点で生成された画像が、カレント・ディレクトリにpngファイルで吐かれます)。
実行方法
以下の2つ
$ pip install deep-daze
$ imagine "a house in the forest"
from deep_daze import Imagine
imagine = Imagine(
text = 'ca house in the forest',
num_layers = 24,
)
imagine()
What is this?
Simple command line tool for text to image generation using OpenAI's CLIP and Siren. Credit goes to Ryan Murdock for the discovery of this technique (and for coming up with the great name)!
2021/8/20現在、 ver 0.10.2が出ている
Deep dazeの公式サイトを和訳した記事
実行したコードと結果 (全文)
Google Colabを立ち上げる

割り当てられたGPUを確認
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
print('Select the Runtime > "Change runtime type" menu to enable a GPU accelerator, ')
print('and then re-execute this cell.')
else:
print(gpu_info)
( 実行結果 )
- __Tesla V100__が割り当てられた
Thu Aug 19 16:27:46 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... Off | 00000000:00:04.0 Off | 0 |
| N/A 34C P0 25W / 300W | 0MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Pythonのバージョンを確認
!python --version
( 実行結果 )
Python 3.7.11
Google Driveをマウントする
from google.colab import drive
drive.mount('/content/drive')
!pwd
( 実行結果 )
/content
- treeコマンドを入れる
!sudo apt-get install tree
( 実行結果 )
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following package was automatically installed and is no longer required:
libnvidia-common-460
Use 'sudo apt autoremove' to remove it.
The following NEW packages will be installed:
tree
0 upgraded, 1 newly installed, 0 to remove and 40 not upgraded.
Need to get 40.7 kB of archives.
After this operation, 105 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 tree amd64 1.7.0-5 [40.7 kB]
Fetched 40.7 kB in 0s (779 kB/s)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 1.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin:
Selecting previously unselected package tree.
(Reading database ... 148486 files and directories currently installed.)
Preparing to unpack .../tree_1.7.0-5_amd64.deb ...
Unpacking tree (1.7.0-5) ...
Setting up tree (1.7.0-5) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
!tree
( 実行結果 )
.
├── drive
│ └── MyDrive
│ └── google_colaboratory_share_folder
│ ├── A_naked_beautiful_girl_without_clothes_standing_in_fromt_of__the_hotel..000001.jpg
│ ├── A_naked_beautiful_girl_without_clothes_standing_in_fromt_of__the_hotel..jpg
│ └── nozomi
│ ├── cvt_nozomi_1.jpg
│ ├── mosaic_nozomi_1.jpg
│ ├── nozomi_1.jpg
│ ├── nozomi_2.jpg
│ ├── nozomi_3.jpg
│ ├── nozomi_4.jpg
│ ├── resized_nozomi_1.jpg
│ ├── resize_nozomi_1.jpg
│ └── trial.txt
└── sample_data
├── anscombe.json
├── california_housing_test.csv
├── california_housing_train.csv
├── mnist_test.csv
├── mnist_train_small.csv
└── README.md
5 directories, 17 files
- Google Driveのrootディレクトリに移動する。
- content/drive/MyDriveという3つの階層のフォルダは、Google Drive側にはなく、Google Colaboratory(またはGoogle Colab, Colab+)側で、デフォルトで表示される階層に過ぎない。
%cd drive/MyDrive
( 実行結果 )
/content/drive/MyDrive
!ls
( 実行結果 )
google_colaboratory_share_folder
!mkdir google_colaboratory_share_folder_2
!tree
( 実行結果 )
.
├── google_colaboratory_share_folder
│ ├── A_naked_beautiful_girl_without_clothes_standing_in_fromt_of__the_hotel..000001.jpg
│ ├── A_naked_beautiful_girl_without_clothes_standing_in_fromt_of__the_hotel..000002.jpg
│ ├── A_naked_beautiful_girl_without_clothes_standing_in_fromt_of__the_hotel..jpg
│ └── nozomi
│ ├── cvt_nozomi_1.jpg
│ ├── mosaic_nozomi_1.jpg
│ ├── nozomi_1.jpg
│ ├── nozomi_2.jpg
│ ├── nozomi_3.jpg
│ ├── nozomi_4.jpg
│ ├── resized_nozomi_1.jpg
│ ├── resize_nozomi_1.jpg
│ └── trial.txt
└── google_colaboratory_share_folder_2
3 directories, 12 files
%cd google_colaboratory_share_folder_2
( 実行結果 )
/content/drive/My Drive/google_colaboratory_share_folder_2
!pwd
( 実行結果 )
/content/drive/My Drive/google_colaboratory_share_folder_2
pip install deep-daze
- imagineは、Terminalに打つコマンドである。
- コマンドなので、Colabでは先頭に「!」を付ける
!imagine "The U.S President walking in the garden of the White House with his wife" --num-layers 32
実行画面
-
セッションが切れないように、ローカルのMacbookから1時間置きにColab Jupyter notebookのURLにアクセスして画面を開く処理を定期実行させていたが、止まってしまった
-
エポック数19回目(全20回)の中の300イテレーション目(全1050イテレーション)まで実行された
mage updated at "./The_U.S_President_walking_in_the_garden_of_the_White_House_with_his_wife.000202.jpg"
epochs: 95% 19/20 [3:30:09<10:53, 653.70s/it]
。
( 省略 )
"loss: -47.58: 99% 1043/1050 [10:47<00:04, 1.61it/s]\u001b[A\n",
"loss: -50.82: 99% 1043/1050 [10:48<00:04, 1.61it/s]\u001b[A\n",
"loss: -50.82: 99% 1044/1050 [10:48<00:03, 1.61it/s]\u001b[A\n",
"loss: -48.14: 99% 1044/1050 [10:48<00:03, 1.61it/s]\u001b[A\n",
"loss: -48.14: 100% 1045/1050 [10:48<00:03, 1.61it/s]\u001b[A\n",
"loss: -49.14: 100% 1045/1050 [10:49<00:03, 1.61it/s]\u001b[A\n",
"loss: -49.14: 100% 1046/1050 [10:49<00:02, 1.61it/s]\u001b[A\n",
"loss: -50.57: 100% 1046/1050 [10:49<00:02, 1.61it/s]\u001b[A\n",
"loss: -50.57: 100% 1047/1050 [10:49<00:01, 1.61it/s]\u001b[A\n",
"loss: -49.27: 100% 1047/1050 [10:50<00:01, 1.61it/s]\u001b[A\n",
"loss: -49.27: 100% 1048/1050 [10:50<00:01, 1.61it/s]\u001b[A\n",
"loss: -48.08: 100% 1048/1050 [10:51<00:01, 1.61it/s]\u001b[A\n",
"loss: -48.08: 100% 1049/1050 [10:51<00:00, 1.61it/s]\u001b[A\n",
"loss: -47.26: 100% 1049/1050 [10:51<00:00, 1.61it/s]\u001b[A\n",
"loss: -47.26: 100% 1050/1050 [10:51<00:00, 1.61it/s]\n",
"epochs: 85% 17/20 [3:05:12<32:37, 652.52s/it]\n",
" \n",
"\u001b[Aimage updated at \"./The_U.S_President_walking_in_the_garden_of_the_White_House_with_his_wife.000178.jpg\"\n",
"epochs: 85% 17/20 [3:05:12<32:37, 652.52s/it]\n",
"iteration: 0% 0/1050 [00:00<?, ?it/s]\u001b[A\n",
"loss: -46.16: 0% 0/1050 [00:00<?, ?it/s]\u001b[A\n",
"loss: -46.16: 0% 1/1050 [00:00<11:54, 1.47it/s]\u001b[A\n",
"loss: -42.15: 0% 1/1050 [00:01<11:54, 1.47it/s]\u001b[A\n",
"loss: -42.15: 0% 2/1050 [00:01<11:16, 1.55it/s]\u001b[A\n",
"loss: -46.09: 0% 2/1050 [00:01<11:16, 1.55it/s]\u001b[A\n",
"loss: -46.09: 0% 3/1050 [00:01<11:02, 1.58it/s]\u001b[A\n",
"loss: -47.67: 0% 3/1050 [00:02<11:02, 1.58it/s]\u001b[A\n",
"loss: -47.67: 0% 4/1050 [00:02<10:57, 1.59it/s]\u001b[A\n",
"loss: -47.12: 0% 4/1050 [00:03<10:57, 1.59it/s]\u001b[A\n",
"loss: -47.12: 0% 5/1050 [00:03<10:55, 1.59it/s]\u001b[A\n",
"loss: -48.26: 0% 5/1050 [00:03<10:55, 1.59it/s]\u001b[A\n",
"loss: -48.26: 1% 6/1050 [00:03<10:54, 1.60it/s]\u001b[A\n",
"loss: -49.60: 1% 6/1050 [00:04<10:54, 1.60it/s]\u001b[A\n",
"loss: -49.60: 1% 7/1050 [00:04<10:51, 1.60it/s]\u001b[A\n",
"loss: -50.31: 1% 7/1050 [00:05<10:51, 1.60it/s]\u001b[A\n",
"loss: -50.31: 1% 8/1050 [00:05<10:49, 1.60it/s]\u001b[A\n",
"loss: -47.57: 1% 8/1050 [00:05<10:49, 1.60it/s]\u001b[A\n",
"loss: -47.57: 1% 9/1050 [00:05<10:48, 1.60it/s]\u001b[A\n",
( 省略 )
"loss: -46.42: 9% 90/1050 [00:56<09:55, 1.61it/s]\u001b[A\n",
"loss: -48.60: 9% 90/1050 [00:56<09:55, 1.61it/s]\u001b[A\n",
"loss: -48.60: 9% 91/1050 [00:56<09:54, 1.61it/s]\u001b[A\n",
"loss: -48.60: 9% 91/1050 [00:57<09:54, 1.61it/s]\u001b[A\n",
"loss: -48.60: 9% 92/1050 [00:57<09:54, 1.61it/s]\u001b[A\n",
"loss: -42.84: 9% 92/1050 [00:57<09:54, 1.61it/s]\u001b[A\n",
"loss: -42.84: 9% 93/1050 [00:57<09:53, 1.61it/s]\u001b[A\n",
"loss: -47.95: 9% 93/1050 [00:58<09:53, 1.61it/s]\u001b[A\n",
"loss: -47.95: 9% 94/1050 [00:58<09:53, 1.61it/s]\u001b[A\n",
"loss: -47.64: 9% 94/1050 [00:59<09:53, 1.61it/s]\u001b[A\n",
"loss: -47.64: 9% 95/1050 [00:59<09:52, 1.61it/s]\u001b[A\n",
"loss: -50.80: 9% 95/1050 [00:59<09:52, 1.61it/s]\u001b[A\n",
"loss: -50.80: 9% 96/1050 [00:59<09:51, 1.61it/s]\u001b[A\n",
"loss: -45.83: 9% 96/1050 [01:00<09:51, 1.61it/s]\u001b[A\n",
"loss: -45.83: 9% 97/1050 [01:00<09:51, 1.61it/s]\u001b[A\n",
"loss: -50.28: 9% 97/1050 [01:00<09:51, 1.61it/s]\u001b[A\n",
"loss: -50.28: 9% 98/1050 [01:00<09:51, 1.61it/s]\u001b[A\n",
"loss: -48.49: 9% 98/1050 [01:01<09:51, 1.61it/s]\u001b[A\n",
"loss: -48.49: 9% 99/1050 [01:01<09:52, 1.61it/s]\u001b[A\n",
"loss: -47.48: 9% 99/1050 [01:02<09:52, 1.61it/s]\u001b[A\n",
" \n",
"\u001b[Aimage updated at \"./The_U.S_President_walking_in_the_garden_of_the_White_House_with_his_wife.000179.jpg\"\n",
"epochs: 85% 17/20 [3:06:15<32:37, 652.52s/it]\n",
"loss: -47.48: 10% 100/1050 [01:02<09:51, 1.61it/s]\u001b[A\n",
"loss: -43.83: 10% 100/1050 [01:02<09:51, 1.61it/s]\u001b[A\n",
"loss: -43.83: 10% 101/1050 [01:02<10:04, 1.57it/s]\u001b[A\n",
"loss: -48.02: 10% 101/1050 [01:03<10:04, 1.57it/s]\u001b[A\n",
"loss: -48.02: 10% 102/1050 [01:03<09:59, 1.58it/s]\u001b[A\n",
ローカルのTeminalから、ColabのJupyter notebookサイトへの定時アクセス
- 「(削除)」部分は、この記事にコードを貼るにあたり、マスキングで削除しました。
electron@diynoMacBook-Pro ~ % vi ./access_colab.sh
electron@diynoMacBook-Pro ~ % cat ./access_colab.sh
# !/bin/bash
for i in `seq 0 12`
do
echo "[$i]" ` date '+%y/%m/%d %H:%M:%S'` "connected."
open "https://colab.research.google.com/drive/18nRt(削除)umgd_sSwOauE"
sleep 1800
done
electron@diynoMacBook-Pro ~ % ./access_colab.sh
[0] 21/08/20 12:58:35 connected.
[1] 21/08/20 13:33:47 connected.
画像ファイルが大量に出力される
- Google Colab+を利用した今回は、ColabからマウントしたGoogle Driveのディレクトリに、画像ファイルが出力される


数時間後に見たら、切れていた
"loss: -51.63: 10% 102/1050 [01:04<09:59, 1.58it/s]\u001b[A\n",
"loss: -51.63: 10% 103/1050 [01:04<09:55, 1.59it/s]\u001b[A\n",
( 省略 )
"loss: -45.88: 28% 298/1050 [03:06<07:49, 1.60it/s]\u001b[A\n",
"loss: -45.88: 28% 299/1050 [03:06<07:49, 1.60it/s]\u001b[A\n",
"loss: -49.29: 28% 299/1050 [03:07<07:49, 1.60it/s]\u001b[A\n",
" \n",
"\u001b[Aimage updated at \"./The_U.S_President_walking_in_the_garden_of_the_White_House_with_his_wife.000202.jpg\"\n",
"epochs: 95% 19/20 [3:30:09<10:53, 653.70s/it]\n",
"loss: -49.29: 29% 300/1050 [03:07<07:48, 1.60it/s]\u001b[A\n",
"loss: -43.48: 29% 300/1050 [03:07<07:48, 1.60it/s]\u001b[A\n",
"loss: -43.48: 29% 301/1050 [03:07<08:00, 1.56it/s]\u001b[A\n",
"loss: -47.35: 29% 301/1050 [03:08<08:00, 1.56it/s]\u001b[A\n",
"loss: -47.35: 29% 302/1050 [03:08<07:55, 1.57it/s]\u001b[A\n",
"loss: -48.24: 29% 302/1050 [03:09<07:55, 1.57it/s]\u001b[A\n",
"loss: -48.24: 29% 303/1050 [03:09<07:51, 1.58it/s]\u001b[A\n",
"loss: -47.17: 29% 303/1050 [03:09<07:51, 1.58it/s]\u001b[A\n",
"loss: -47.17: 29% 304/1050 [03:09<07:49, 1.59it/s]\u001b[A\n",
"loss: -43.97: 29% 304/1050 [03:10<07:49, 1.59it/s]\u001b[A\n",
"loss: -43.97: 29% 305/1050 [03:10<07:48, 1.59it/s]\u001b[A\n",
"loss: -46.13: 29% 305/1050 [03:10<07:48, 1.59it/s]\u001b[A\n",
"loss: -46.13: 29% 306/1050 [03:10<07:46, 1.59it/s]\u001b[A\n",
"loss: -44.92: 29% 306/1050 [03:11<07:46, 1.59it/s]\u001b[A\n",
"loss: -44.92: 29% 307/1050 [03:11<07:44, 1.60it/s]\u001b[A\n",
"loss: -43.60: 29% 307/1050 [03:12<07:44, 1.60it/s]\u001b[A\n",
"loss: -43.60: 29% 308/1050 [03:12<07:42, 1.60it/s]\u001b[A\n",
"loss: -46.96: 29% 308/1050 [03:12<07:42, 1.60it/s]\u001b[A\n",
"loss: -46.96: 29% 309/1050 [03:12<07:41, 1.60it/s]\u001b[A\n",
"loss: -49.73: 29% 309/1050 [03:13<07:41, 1.60it/s]\u001b[A\n",
"loss: -49.73: 30% 310/1050 [03:13<07:40, 1.61it/s]\u001b[A"
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
print('Select the Runtime > "Change runtime type" menu to enable a GPU accelerator, ')
print('and then re-execute this cell.')
else:
print(gpu_info)
( 実行結果 )
セッションが多すぎます
アクティブなセッションが多すぎるため、新しいセッションを作成できません。新しいセッションを作成するには、既存のセッションを終了してください。
