Amazon Titan Image Generator G1 V2と戯れる（Gradioがおすすめ）

Last updated at 2024-08-15Posted at 2024-08-15

お久しぶりです！

Bedrockで使えるTitan Image Generator G1にV2が登場しました！

G1のV2です！ややこしや

プレイグラウンドでもテキストからの画像生成は試せるのですが、V2の新機能を試すことができません。

a jar of salad dressing in a rustic kitchen surrounded by fresh vegetables with studio lighting（スタジオライトを使用した飾り気のないキッチンにある、新鮮な野菜が周りに置かれたサラダドレッシングの瓶）

Titan Image Generator G1の機能紹介

Titan Image Generator G1の機能を紹介します。V1でできることはV2でもできて、更にV2だけできることがあるような形です。

タスクタイプ	V2のみ	内容
TEXT_IMAGE		テキストプロンプトを使用して画像を生成します。
TEXT_IMAGE	○	テキストプロンプトとともに追加の入力調整画像を提供し、調整画像のレイアウトと構成に従った画像を生成します。
INPAINTING		マスクの内側を周囲の背景に合わせて変更することで、画像を修正します。
OUTPAINTING		マスクによって定義された領域をシームレスに拡張して画像を変更します。
IMAGE_VARIATION		元の画像のバリエーションを作成して画像を変更します。
COLOR_GUIDED_GENERATION	○	カラーパレットに従った画像を生成するために、16 進カラーコードのリストとテキストプロンプトを提供します。
BACKGROUND_REMOVAL	○	複数のオブジェクトを識別して背景を削除し、透明な背景の画像を出力することで画像を変更します。

参考：https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-titan-image.html#model-parameters-titan-image-code-examples

Gradioの紹介

Gradioは、生成AIアプリのプロトタイプを簡単に作成できるフレームワークです。

テキスト入力などのフォームや画像の入出力が簡単に扱えます。
また、Hugging Faceでホスティングすることが可能です。なんと無料プランでも利用可能です。

GradioでTEXT_IMAGEアプリを簡単に構築する

最もベーシックなテキストプロンプトを与えて画像を生成する方法を紹介します。

まず、必要なライブラリーをインストールします。

pip install boto3 gradio

最低限のPythonのコードを記述します。

app.py

import gradio as gr


def text_image_fn():
    pass


demo = gr.Interface(fn=text_image_fn, inputs=[gr.TextArea()], outputs=[gr.Image()])
demo.launch()

なんとなく、わかると思いますが、「gr.Interface()」でインスタンスを作成して、「demo.launch()」でアプリを起動しています。

「gr.Interface()」のパラメータを3つ使用しており、

fn: インプットを受け取ってアウトプットを返却する関数
inputs: 入力側のコンポーネント。この例ではテキストエリア
outputs: 出力側のコンポーネント。この例では画像

これだけのコードを記述し、「python app.py」で実行できます。コンソールに出力されるURLへアクセスすると、以下のような画面が表示されます。

左側が入力で右側が出力です。

関数の処理を追加

公式ドキュメントのサンプルを参考に、関数に処理を追加していきます。

generate_image関数をほぼそのまま持ってきます。boto3を使ってBedrockを呼び出す処理です。

def generate_image(model_id, body):
    """
    Generate an image using Amazon Titan Image Generator G1 model on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        image_bytes (bytes): The image generated by the model.
    """

    logger.info(
        "Generating image with Amazon Titan Image Generator G1 model %s", model_id
    )

    bedrock = boto3.client(service_name="bedrock-runtime")

    accept = "application/json"
    content_type = "application/json"

    response = bedrock.invoke_model(
        body=body, modelId=model_id, accept=accept, contentType=content_type
    )
    response_body = json.loads(response.get("body").read())

    base64_image = response_body.get("images")[0]
    base64_bytes = base64_image.encode("utf-8")
    image_bytes = base64.b64decode(base64_bytes)

    finish_reason = response_body.get("error")

    if finish_reason is not None:
        raise ImageError(f"Image generation error. Error is {finish_reason}")

    logger.info(
        "Successfully generated image with Amazon Titan Image Generator G1 model %s",
        model_id,
    )

    return image_bytes

次にtext_image_fn関数に処理を記述していきます。
まず、パラメーターはstr型で受け取ります。パラメーター名は何でもOKだと思います。

パラメーターを使用して、Bedrockへのリクエストボディを生成し、先程のgenerate_image関数を呼び出します。
画像の形式を変換して返却します。

def text_image_fn(prompt: str) -> Image.Image:

    logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")

    body = json.dumps(
        {
            "taskType": "TEXT_IMAGE",
            "textToImageParams": {"text": prompt},
            "imageGenerationConfig": {
                "numberOfImages": 1,
                "height": 1024,
                "width": 1024,
                "cfgScale": 8.0,
                "seed": 0,
            },
        }
    )

    image_bytes = generate_image(model_id=model_id, body=body)

    return Image.open(io.BytesIO(image_bytes))

ソースコード全体

app.py

import base64
import io
import json
import logging

import boto3
import gradio as gr
from PIL import Image


class ImageError(Exception):
    "Custom exception for errors returned by Amazon Titan Image Generator G1"

    def __init__(self, message):
        self.message = message


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


model_id = "amazon.titan-image-generator-v2:0"


def generate_image(model_id, body):
    """
    Generate an image using Amazon Titan Image Generator G1 model on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        image_bytes (bytes): The image generated by the model.
    """

    logger.info(
        "Generating image with Amazon Titan Image Generator G1 model %s", model_id
    )

    bedrock = boto3.client(service_name="bedrock-runtime")

    accept = "application/json"
    content_type = "application/json"

    response = bedrock.invoke_model(
        body=body, modelId=model_id, accept=accept, contentType=content_type
    )
    response_body = json.loads(response.get("body").read())

    base64_image = response_body.get("images")[0]
    base64_bytes = base64_image.encode("utf-8")
    image_bytes = base64.b64decode(base64_bytes)

    finish_reason = response_body.get("error")

    if finish_reason is not None:
        raise ImageError(f"Image generation error. Error is {finish_reason}")

    logger.info(
        "Successfully generated image with Amazon Titan Image Generator G1 model %s",
        model_id,
    )

    return image_bytes


def text_image_fn(prompt: str) -> Image.Image:

    logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")

    body = json.dumps(
        {
            "taskType": "TEXT_IMAGE",
            "textToImageParams": {"text": prompt},
            "imageGenerationConfig": {
                "numberOfImages": 1,
                "height": 1024,
                "width": 1024,
                "cfgScale": 8.0,
                "seed": 0,
            },
        }
    )

    image_bytes = generate_image(model_id=model_id, body=body)

    return Image.open(io.BytesIO(image_bytes))


demo = gr.Interface(fn=text_image_fn, inputs=[gr.TextArea()], outputs=[gr.Image()])
demo.launch()

Pythonのスクリプトを修正したあとは、一度Gradioは再起動が必要なので、python app.pyを一度Ctrl + cで停止し、再度python app.pyを実行します。

毎回この処理を行うのはめんどくさいので、ホットリロード機能も用意されています。

gradio app.py

これだけで、画像生成アプリが完成です！

TEXT_IMAGE (Conditioned Image Generation)

もう一つ機能を追加します。V2で追加されたConditioned Image Generationは、プロンプトだけでなく参照画像を一緒に渡すことで、参照画像のレイアウトを維持した画像を生成することができます。（AWSブログでは「画像コンディショニング」という名称で紹介されています）

新しい関数を定義します。プロンプトだけでなく、画像とコントロールモードを渡すようにしました。

def text_image_fn_v2(image: Image.Image, prompt: str, control_mode: str) -> Image.Image:

    logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")

    body = json.dumps(
        {
            "taskType": "TEXT_IMAGE",
            "textToImageParams": {
                "text": prompt,
                "conditionImage": base64.b64encode(image_to_byte(image)).decode("utf8"),
                "controlMode": control_mode,  # CANNY_EDGE | SEGMENTATION
            },
            "imageGenerationConfig": {
                "numberOfImages": 1,
                "height": 1024,
                "width": 1024,
            },
        }
    )

    image_bytes = generate_image(model_id=model_id, body=body)

    return Image.open(io.BytesIO(image_bytes))

入力が増えたので、gr.Interfaceのパラメーターのinputsが複数になります。

gr.Image()は出力で使ってましたが、入力でも同じものを指定します。わかりやすくていいですね。

gr.Interface(
    fn=text_image_fn_v2,
    inputs=[
        gr.Image(type="pil"),
        gr.TextArea(),
        gr.Dropdown(choices=["CANNY_EDGE", "SEGMENTATION"], value="CANNY_EDGE"),
    ],
    outputs=[gr.Image()],
)

Gradioの便利機能で「タブ」機能があります。はじめに作った「テキストから画像を生成する機能」と「参考画像をもとに画像生成する機能」をタブで切り替えられるようになります。実装は以下のような形です。

tab1 = gr.Interface(
    fn=text_image_fn,
    inputs=[gr.TextArea()],
    outputs=[gr.Image()],
)

tab2 = gr.Interface(
    fn=text_image_fn_v2,
    inputs=[
        gr.Image(type="pil"),
        gr.TextArea(),
        gr.Dropdown(choices=["CANNY_EDGE", "SEGMENTATION"], value="CANNY_EDGE"),
    ],
    outputs=[gr.Image()],
)

demo = gr.TabbedInterface(
    [tab1, tab2], ["Text-to-image generation", "Conditioned Image Generation"]
)
demo.launch()

ソースコード全体

app.py

import base64
import io
import json
import logging

import boto3
import gradio as gr
from PIL import Image


class ImageError(Exception):
    "Custom exception for errors returned by Amazon Titan Image Generator G1"

    def __init__(self, message):
        self.message = message


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


model_id = "amazon.titan-image-generator-v2:0"


def generate_image(model_id, body):
    """
    Generate an image using Amazon Titan Image Generator G1 model on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        image_bytes (bytes): The image generated by the model.
    """

    logger.info(
        "Generating image with Amazon Titan Image Generator G1 model %s", model_id
    )

    bedrock = boto3.client(service_name="bedrock-runtime")

    accept = "application/json"
    content_type = "application/json"

    response = bedrock.invoke_model(
        body=body, modelId=model_id, accept=accept, contentType=content_type
    )
    response_body = json.loads(response.get("body").read())

    base64_image = response_body.get("images")[0]
    base64_bytes = base64_image.encode("utf-8")
    image_bytes = base64.b64decode(base64_bytes)

    finish_reason = response_body.get("error")

    if finish_reason is not None:
        raise ImageError(f"Image generation error. Error is {finish_reason}")

    logger.info(
        "Successfully generated image with Amazon Titan Image Generator G1 model %s",
        model_id,
    )

    return image_bytes


def image_to_byte(image: Image.Image):
    img_bytes = io.BytesIO()
    image.save(img_bytes, format="PNG")
    return img_bytes.getvalue()


def text_image_fn(prompt: str) -> Image.Image:

    logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")

    body = json.dumps(
        {
            "taskType": "TEXT_IMAGE",
            "textToImageParams": {"text": prompt},
            "imageGenerationConfig": {
                "numberOfImages": 1,
                "height": 1024,
                "width": 1024,
                "cfgScale": 8.0,
                "seed": 0,
            },
        }
    )

    image_bytes = generate_image(model_id=model_id, body=body)

    return Image.open(io.BytesIO(image_bytes))


def text_image_fn_v2(image: Image.Image, prompt: str, control_mode: str) -> Image.Image:

    logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")

    body = json.dumps(
        {
            "taskType": "TEXT_IMAGE",
            "textToImageParams": {
                "text": prompt,
                "conditionImage": base64.b64encode(image_to_byte(image)).decode("utf8"),
                "controlMode": control_mode,  # CANNY_EDGE | SEGMENTATION
            },
            "imageGenerationConfig": {
                "numberOfImages": 1,
                "height": 1024,
                "width": 1024,
            },
        }
    )

    image_bytes = generate_image(model_id=model_id, body=body)

    return Image.open(io.BytesIO(image_bytes))


tab1 = gr.Interface(
    fn=text_image_fn,
    inputs=[gr.TextArea()],
    outputs=[gr.Image()],
)

tab2 = gr.Interface(
    fn=text_image_fn_v2,
    inputs=[
        gr.Image(type="pil"),
        gr.TextArea(),
        gr.Dropdown(choices=["CANNY_EDGE", "SEGMENTATION"], value="CANNY_EDGE"),
    ],
    outputs=[gr.Image()],
)

demo = gr.TabbedInterface(
    [tab1, tab2], ["Text-to-image generation", "Conditioned Image Generation"]
)
demo.launch()

実行するとこんな感じです。なんとなくレイアウトを維持したまま、サメがクマに変わりましたね。

モードが「CANNY_EDGE」と「SEGMENTATION」があり、SEGMENTATIONの方を使うと、手書きのポンチ絵をいい感じにしてくれるようなこともできました。

（ただ、まぁ、なんとなくな感じですね）

その他

他の機能も試してみたのが、画面キャプチャだけおいておきます。

Color Guided Content （カラーパレットによる画像ガイダンス）

色味を指定して画像生成が行えます。（先程のConditioned Image Generationとタスクのタイプが異なるので、画像を参照しながら色味を気にすることはできないのかもしれない）

暖色系を指定
寒色系を指定

Background Removal （背景の削除）

これは一目瞭然、背景の削除です。

右手（？）と右足（？）の間だけおしい気がしますが、それ以外は完璧だと思います！

ソースコード

import base64
import io
import json
import logging

import boto3
import gradio as gr
from PIL import Image


class ImageError(Exception):
    "Custom exception for errors returned by Amazon Titan Image Generator G1"

    def __init__(self, message):
        self.message = message


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


model_id = "amazon.titan-image-generator-v2:0"


def generate_image(model_id, body):
    """
    Generate an image using Amazon Titan Image Generator G1 model on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        image_bytes (bytes): The image generated by the model.
    """

    logger.info(
        "Generating image with Amazon Titan Image Generator G1 model %s", model_id
    )

    bedrock = boto3.client(service_name="bedrock-runtime")

    accept = "application/json"
    content_type = "application/json"

    response = bedrock.invoke_model(
        body=body, modelId=model_id, accept=accept, contentType=content_type
    )
    response_body = json.loads(response.get("body").read())

    base64_image = response_body.get("images")[0]
    base64_bytes = base64_image.encode("utf-8")
    image_bytes = base64.b64decode(base64_bytes)

    finish_reason = response_body.get("error")

    if finish_reason is not None:
        raise ImageError(f"Image generation error. Error is {finish_reason}")

    logger.info(
        "Successfully generated image with Amazon Titan Image Generator G1 model %s",
        model_id,
    )

    return image_bytes


def image_to_byte(image: Image.Image):
    img_bytes = io.BytesIO()
    image.save(img_bytes, format="PNG")
    return img_bytes.getvalue()


def text_image_fn(prompt: str) -> Image.Image:

    logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")

    body = json.dumps(
        {
            "taskType": "TEXT_IMAGE",
            "textToImageParams": {"text": prompt},
            "imageGenerationConfig": {
                "numberOfImages": 1,
                "height": 1024,
                "width": 1024,
                "cfgScale": 8.0,
                "seed": 0,
            },
        }
    )

    image_bytes = generate_image(model_id=model_id, body=body)

    return Image.open(io.BytesIO(image_bytes))


def conditioned_image_generation_fn(
    image: Image.Image, prompt: str, control_mode: str
) -> Image.Image:

    logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")

    body = json.dumps(
        {
            "taskType": "TEXT_IMAGE",
            "textToImageParams": {
                "text": prompt,
                "conditionImage": base64.b64encode(image_to_byte(image)).decode("utf8"),
                "controlMode": control_mode,  # CANNY_EDGE | SEGMENTATION
            },
            "imageGenerationConfig": {
                "numberOfImages": 1,
                "height": 1024,
                "width": 1024,
            },
        }
    )

    image_bytes = generate_image(model_id=model_id, body=body)

    return Image.open(io.BytesIO(image_bytes))


def color_guided_content_fn(
    image: Image.Image,
    text: str,
    negative_text: str,
    color1: str,
    color2: str,
    color3: str,
    color4: str,
):
    logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")

    input_image = base64.b64encode(image_to_byte(image)).decode("utf8")

    body = json.dumps(
        {
            "taskType": "COLOR_GUIDED_GENERATION",
            "colorGuidedGenerationParams": {
                "text": text,
                "negativeText": negative_text,
                "referenceImage": input_image,
                "colors": [color1, color2, color3, color4],
            },
            "imageGenerationConfig": {
                "numberOfImages": 1,
                "height": 1024,
                "width": 1024,
            },
        }
    )

    image_bytes = generate_image(model_id=model_id, body=body)
    response_image = Image.open(io.BytesIO(image_bytes))

    return response_image


def background_removal_fn(image: Image.Image):
    logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")

    input_image = base64.b64encode(image_to_byte(image)).decode("utf8")

    body = json.dumps(
        {
            "taskType": "BACKGROUND_REMOVAL",
            "backgroundRemovalParams": {
                "image": input_image,
            },
        }
    )

    image_bytes = generate_image(model_id=model_id, body=body)
    response_image = Image.open(io.BytesIO(image_bytes))

    return response_image


tab1 = gr.Interface(
    fn=text_image_fn,
    inputs=[gr.TextArea()],
    outputs=gr.Image(format="png"),
)

tab2 = gr.Interface(
    fn=conditioned_image_generation_fn,
    inputs=[
        gr.Image(type="pil"),
        gr.TextArea(),
        gr.Dropdown(choices=["CANNY_EDGE", "SEGMENTATION"], value="CANNY_EDGE"),
    ],
    outputs=gr.Image(format="png"),
)

tab3 = gr.Interface(
    fn=color_guided_content_fn,
    inputs=[
        gr.Image(type="pil"),
        gr.TextArea(),
        gr.ColorPicker(value="#ff8080"),
        gr.ColorPicker(value="#ffb280"),
        gr.ColorPicker(value="#ffe680"),
        gr.ColorPicker(value="#ffe680"),
    ],
    outputs=gr.Image(format="png"),
)

tab4 = gr.Interface(
    fn=background_removal_fn,
    inputs=[
        gr.Image(type="pil"),
    ],
    outputs=gr.Image(format="png"),
)

demo = gr.TabbedInterface(
    [tab1, tab2, tab3, tab4],
    [
        "Text-to-image generation",
        "Conditioned Image Generation",
        "Color Guided Content",
        "Background Removal",
    ],
)
demo.launch()

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up