Googleの画像生成AI「Imagen3」のプロンプト強化の効果とは？

Last updated at 2025-02-05Posted at 2025-02-05

はじめに

GoogleのImagen3が画像生成AIモデルアリーナで現在1位になっています。
高品質の画像を生成するので納得の結果です。

そんなImagen3は、Google CloudのVertex AI上で使用することができます。
Prompt enhancement using prompt rewriterという機能が追加されました。
これは、より高品質な画像を生成するためにプロンプトをいい感じに書き換えてくれる機能です。
元のプロンプトの長さが 30 語未満の場合にのみ適用されるようです。
現在はimagen-3.0-generate-002でのみ使用できます。

この機能の効果を確認してみました。

実行方法

REST API経由で実行できます。
以下のようにリクエストをします。
enhancePromptで有効 or 無効を選択できます。

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d '{
      "instances": [
        {
          "prompt": {PROMPT}
        }
      ],
      "parameters": {
        "sampleCount": 1,
        "enhancePrompt": true
      }
    }' \
     "https://{LOCATION}-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/{MODEL_VERSION}:predict"

レスポンスに変更されたプロンプトと画像のbytesBase64Encodedが返ってきます。

  {
    "predictions": [
      {
        "mimeType": "MIME_TYPE",
        "prompt": "ENHANCED_PROMPT_1",
        "bytesBase64Encoded": "BASE64_IMG_BYTES_1"
      },
      {
        "mimeType": "MIME_TYPE",
        "prompt": "ENHANCED_PROMPT_2",
        "bytesBase64Encoded": "BASE64_IMG_BYTES_2"
      }
    ]
  }

bytesBase64Encodedの画像化例

レスポンスの標準出力をファイルに出力します。

curl  -X POST ~ > response.json

このファイルをPythonで画像化します。

import json

with open('response.json') as f:
    d = json.load(f)

import base64
from io import BytesIO
from PIL import Image

def base64_to_image(base64_string, output_path="output.png"):
    try:
        # 1. base64デコード
        image_data = base64.b64decode(base64_string)

        # 2. 画像データとして読み込み
        image = Image.open(BytesIO(image_data))

        # 3. ファイルに保存
        image.save(output_path)
        print(f"画像は {output_path} に保存されました。")

    except Exception as e:
        print(f"エラーが発生しました: {e}")


# 使用例
base64_to_image(d["predictions"][0]["bytesBase64Encoded"], "output.png")

プロンプト強化の効果

検証1

プロンプトとしてA dog and a cat are dancing in snowy Tokyoと入力してみました。

プロンプト強化無効の場合

プロンプト強化有効の場合

プロンプトは以下のように変更されました。
かなりの強化っぷりです。

A Shiba Inu dog and a fluffy white Persian cat are merrily dancing in the heart of a snowy Tokyo street, surrounded by charming traditional architecture. 
The vibrant scene is captured in a high-quality, cinematic photograph, with a wide-angle lens and a depth of field that draws the eye to the playful pair. 
The backdrop features snow-covered buildings in the Shibuya district, showcasing traditional Japanese elements mixed with modern city architecture. 
The duo is captured mid-step, their bodies slightly blurred, creating a sense of movement and energy. Warm ambient lighting illuminates the scene, adding a whimsical charm to the wintery tableau, with the cold blue tones of the snow complementing the warmth of the lighting, and a few fallen leaves in the foreground that add a touch of vibrant colour to the scene.

かなりクオリティ高くなっています！

検証2

プロンプトを文章ではなく、単語区切りで試してみます。
dog, cat, dance, snow, Tokyo

プロンプト強化無効の場合

プロンプト強化有効の場合

A whimsical, high-quality photo of a fluffy white Shiba Inu dog and a sleek black cat, dancing in the falling snow in Shinjuku Gyoen National Garden, Tokyo.  
The scene is filled with a joyful, playful energy, the two animals gracefully moving in sync to an imaginary melody, their paws lightly touching the pristine white ground.
The background is a blur of frosted trees and traditional Japanese architecture, the serene landscape adding a touch of peace and tranquility to the moment.  
The photo captures a fleeting, delightful instant of unexpected companionship, creating a charming and magical impression of a winter's day in Tokyo. 
The photo is taken with a Canon EOS R5 camera using a wide angle lens to capture the scope of the scene, and to maintain focus on both animals.

背景のより凝った画像に仕上がりました。
Tokyoから新宿御苑を連想したみたいです。

まとめ

Imagen3のプロンプト強化を試してみました。
この機能を使用することで、生成されるクオリティの違いを確認することができました。
単語の羅列だけでも、背景や情景などを追加したプロンプトにしてくれるので、プロンプトが思いつかないときには良い機能だと思いました。
今まではいい感じのプロンプトを作ってもらうのに、生成AIをかましてところを直でImagen3の呼び出しだけでプロンプト強化ができるようになったのも良いポイントだと思いました。
(内部的にはLLMを使用してプロンプト強化しているっぽいですが、無駄なAPIリクエストを省ける)

- プロンプト → LLM → プロンプト強化 → Imagen3 → 画像
+ プロンプト → Imagen3 → 画像

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up