More than 1 year has passed since last update.

内なる自分を可視化してみた（画像生成AI）

Posted at 2024-03-31

内なる自分

日常の中で、私は時折、自分の内なる心の声と実際の発言で食い違う瞬間がありますよね。まるでアニメのように、内なる自分がリアルな世界に登場するとしたら、それは一体どんな風に表現されるのでしょうか？

今回は、内なる自分がどのような姿なのか、画像生成AIを用いて可視化してみました。

内なる自分は、自分の姿と性格を反映していると仮定し、自身の写真と性格診断（MBTI）の結果をもとに画像生成した姿を内なる自分としました。

↓の画像が生成した私の内なる自分です。

開発環境

Google Colab
python 3.10.12

画像生成

日本語版Stable Diffusionの学習済モデルに画像とテキストを入力し、新たな画像を生成するImage to Imageを行いました。また、モデルはHugging Faceから利用しています。

Hugging Faceは、人工知能（AI）の分野で注目をされており、AIモデルやデータセットを共有・利用することができるプラットフォームです。様々なデータセットが公開されており、開発者などが簡単にアクセスし、利用できるようになっています。

Prompt

今回は、性格診断（MBTI）の結果をPromptに用いました。
私のMBTIは下図の結果であったため、幹部以外の単語を%が高い順に並べて、
「幹部, 論理的, 計画的, 外交的, 自己主張的, 現実的」をPromptとして、設定しました。

実装

パッケージのインストール

pip install diffusers

pip install transformers

pip install git+https://github.com/rinnakk/japanese-stable-diffusion

ライブラリのインポートとデバイスの設定

必要なライブラリやモジュールをインポート
利用可能なデバイス（GPUまたはCPU）を確認し、使用するデバイスを設定

import os

from japanese_stable_diffusion import JapaneseStableDiffusionImg2ImgPipeline
from diffusers.pipelines.stable_diffusion.pipeline_onnx_stable_diffusion_img2img import preprocess

import torch
from torch import autocast
device = "cuda" if torch.cuda.is_available() else "cpu"
print("using device is", device)

import matplotlib.pyplot as plt
from PIL import Image

アクセストークンとファイルの準備

APIへのアクセスに必要なアクセストークンを設定
元画像のファイルパスを指定
使用するモデルのIDを指定

access_tokens = "Hugging Faceで発行したアクセストークン"

filename = os.path.join("元画像のファイルパス")

model_id = "rinna/japanese-stable-diffusion"

モデルのロードと画像の準備

指定したモデルをロードし、画像生成のパイプラインをセットアップ
元画像を読み込み、必要に応じてRGB形式に変換
元画像をリサイズし、モデルへの入力用に前処理を行う
リサイズされた元画像を表示

pipe = JapaneseStableDiffusionImg2ImgPipeline.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    use_auth_token=access_tokens
).to(device)

init_image = Image.open(filename)
if '.png' in filename:
  init_image = init_image.convert('RGB')
resize_image = init_image.resize((512, 762))
input_img = preprocess(resize_image)

plt.imshow(resize_image)
plt.axis('off')
plt.show()

Promptの指定と画像生成

Promptの指定
画像生成の際に使用するランダムシードを設定
自動混合精度（autocast）を使用して、画像生成のための入力データを準備し、モデルに渡して画像を生成
生成された画像を指定した場所に保存し、表示

prompt = "幹部, 論理的, 計画的, 外交的, 自己主張的, 現実的"

generator = torch.Generator(device).manual_seed(12)

with autocast(device):
  x = torch.from_numpy(input_img).clone()
  output = pipe(prompt=prompt, init_image=x, strength=0.75, guidance_scale=7.5, num_inference_steps=50, generator=generator)
  images = output["images"]

images[0].save("保存する場所")

images[0]

まとめ

上記のcodeをまとめたもの

#ライブラリのインポートとデバイスの設定
import os

from japanese_stable_diffusion import JapaneseStableDiffusionImg2ImgPipeline
from diffusers.pipelines.stable_diffusion.pipeline_onnx_stable_diffusion_img2img import preprocess

import torch
from torch import autocast
device = "cuda" if torch.cuda.is_available() else "cpu"
print("using device is", device)

import matplotlib.pyplot as plt
from PIL import Image


#アクセストークンとファイルの準備
access_tokens = "アクセストークン"
filename = os.path.join("元画像のファイルパス")
model_id = "rinna/japanese-stable-diffusion"


#モデルのロードと画像の準備
pipe = JapaneseStableDiffusionImg2ImgPipeline.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    use_auth_token=access_tokens
).to(device)

init_image = Image.open(filename)
if '.png' in filename:
  init_image = init_image.convert('RGB')
resize_image = init_image.resize((512, 762))
input_img = preprocess(resize_image)

plt.imshow(resize_image)
plt.axis('off')
plt.show()


#Promptの指定と画像生成
prompt = "幹部, 論理的, 計画的, 外交的, 自己主張的, 現実的"

generator = torch.Generator(device).manual_seed(12)

with autocast(device):
  x = torch.from_numpy(input_img).clone()
  output = pipe(prompt=prompt, init_image=x, strength=0.75, guidance_scale=7.5, num_inference_steps=50, generator=generator)
  images = output["images"]

images[0].save("保存する場所")

images[0]

感想

今回初めて利用したHugging Faceには、画像生成だけでなく、動画生成など様々なモデルやデータセットがあるため、他のものも試して見たいと思いました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up