NovelAIを使用して「初音ミク」のビジュアルを生成する実験についてご紹介ください。

Posted at 2025-09-03

ChatGPT:
もちろんです。ここでは、NovelAI を使って「初音ミク」のビジュアルを生成する実験について、背景・方法・注意点を整理してご紹介します。

背景
初音ミクはクリプトン社のキャラクターで、バーチャルシンガーとして世界的に有名です。
NovelAI はテキストプロンプトに基づき、高品質なイラストやアニメ風画像を生成できるサービスで、ファンアートや実験的なビジュアル制作にも利用されています。
実験の目的は、プロンプト設計によってどのようにキャラクター特徴（髪型、服装、色彩、ポーズなど）が再現されるかを探索することです。

準備
NovelAI アカウント登録
サブスクリプションプランが必要
Webブラウザまたは NovelAI API の利用
WebUIでは「Text to Image」「Image to Image」両方が使用可能
プロンプト設計
初音ミクを特定するキーワードを含める
髪型（ツインテール）、服装（ミクの制服風）、色（青・緑系）などを明示的に指定
スタイル指定も可能（例: anime, high detail, full body, dynamic pose）

生成手順（実験例）
A. テキスト→画像（Text to Image）
NovelAI WebUI を開く
「Prompt」欄に入力例：
Hatsune Miku, anime style, turquoise twin tails, futuristic outfit, holding microphone, dynamic pose, highly detailed, full body, vibrant colors
生成ボタンをクリックし、複数バリエーションを確認
好みの画像を選択して保存
B. 画像→画像（Image to Image）でリファイン
元となる初音ミクのイラストを入力
スタイルや背景を変更して再生成
強度（Strength）パラメータを調整して、元イラストの特徴をどれだけ保持するか指定

注意点
著作権・商用利用
初音ミクは商標・著作権があるため、生成画像の商用利用には注意が必要
個人実験や非商用での利用に限定するのが無難
プロンプト設計
「Hatsune Miku」という名前だけでは生成されない場合がある
髪型や服装、カラーなどの具体的属性を追加すると認識精度が向上
バリエーション探索
NovelAIではランダム性があるため、複数回生成して比較することで、キャラクター特徴の再現性を評価できる

実験の応用
オリジナル衣装やポーズのデザインを試作
ファンアート用のビジュアル素材作成
他のAI生成ツール（Stable Diffusion など）との比較研究
💡 ポイントまとめ
NovelAI はアニメ風キャラクター生成に強力
初音ミク特有の特徴をプロンプトで明示することが重要
Image to Image を使えば既存イラストの改変やリファインも可能
著作権・商用利用には注意
必要であれば、具体的なプロンプト例を複数提示して、生成結果の違いを比較する実験レポート形式でまとめることもできます。
作りますか？

はい。お願いします。

ChatGPT:

\わかりました。では、NovelAIで初音ミクのビジュアル生成を比較実験するためのプロンプト例と、生成結果の違いを整理するレポート形式でご紹介します。

NovelAI 初音ミク生成実験レポート例

実験目的

NovelAIで「初音ミク」のビジュアルを生成し、プロンプトの違いが生成結果に与える影響を確認する。

髪型、服装、ポーズ、背景、スタイルなどの条件を変化させ、最適なプロンプト設計を探索。

実験条件
条件番号プロンプト例備考
1 Hatsune Miku, anime style, turquoise twin tails, futuristic outfit 基本プロンプト、標準的アニメ風
2 Hatsune Miku, full body, holding microphone, dynamic pose, high detail, vibrant colors ポーズと小物を明示
3 Hatsune Miku, winter outfit, snowy background, detailed lighting 衣装＋背景条件追加
4 Hatsune Miku, close-up, expression smiling, soft lighting, anime style クローズアップと表情重視
5 Hatsune Miku, cyberpunk outfit, neon city background, cinematic lighting スタイルを大きく変更（近未来風）

同一解像度・ステップ数で生成

各条件で3～5回生成してバリエーション比較

実験手順

NovelAI WebUI を開き「Text to Image」を選択

上記条件のプロンプトを入力

生成ボタンをクリックし、複数バリエーションを確認

保存・比較用にスクリーンショットや出力画像を整理

必要に応じて「Image to Image」でリファイン（背景や衣装の微調整）

観察ポイント

髪型・色彩再現性

ツインテールや髪色の認識率を確認

衣装と小物の再現

「futuristic outfit」や「cyberpunk outfit」などの指定で生成の変化

ポーズ・表情

ダイナミックポーズや笑顔などを明示すると認識されやすい

背景との統合

「snowy background」や「neon city background」でキャラクターとの統合度を評価

実験結果（例示）
条件観察結果コメント
1 髪色やツインテールはほぼ再現、衣装は基本的初期プロンプトでも基本形は生成可能
2 マイクを持ったポーズは大体再現、ポーズ角度にばらつき小物指定が有効
3 冬服・雪背景は認識されるが、キャラクター輪郭に若干変形あり背景条件追加で生成難易度上昇
4 クローズアップで顔は高精細、髪のディテールや表情が鮮明クローズアップ向きプロンプトは精細さが向上
5 サイバーパンク衣装・ネオン背景は生成可能、衣装ディテールや背景統合にばらつきスタイル変更で自由度は高いがランダム性も増える
考察・まとめ

プロンプト詳細化が重要

キャラクターの特徴（髪型・色・衣装）を明示することで再現性向上

背景条件の追加は生成難易度を上げる

特に衣装・背景の統合度にばらつきが出やすい

ポーズ・小物の指定は有効

「holding microphone」「dynamic pose」などの条件で再現性が向上

Image to Imageでのリファイン

元イラストを入力してスタイルや背景を調整すると、より意図通りの生成が可能

💡 実験応用例

初音ミクファンアートやデザイン試作

他AI（Stable Diffusionなど）との比較研究

プロンプト設計のノウハウ蓄積（キャラクター生成ガイドライン作成）

もし希望であれば、この実験をPythonスクリプトで自動化し、複数プロンプトの画像を一括生成・比較できるサンプルコードも作れます。
作りますか？

あなた:
はい。お願いします。
ChatGPT:

わかりました。それでは、NovelAIのText-to-Imageを複数プロンプトで自動生成し、画像をまとめて保存・比較するPythonスクリプトの例をご紹介します。

※ NovelAI は公式に Python API が提供されているわけではありませんが、非公式APIやHTTPリクエストを用いて生成する形になります。ここでは一般的なHTTPリクエスト例に基づいたスクリプトを示します。

複数プロンプトで初音ミク生成・比較スクリプト例
import requests
import base64
from pathlib import Path

保存先ディレクトリ

output_dir = Path("novelai_miku")
output_dir.mkdir(exist_ok=True)

NovelAI API情報（非公式）※自分のAPIキーに置き換え

API_URL = "https://api.novelai.net/ai/generate-image"
API_KEY = "YOUR_NOVELAI_API_KEY"

複数プロンプト

prompts = [
"Hatsune Miku, anime style, turquoise twin tails, futuristic outfit",
"Hatsune Miku, full body, holding microphone, dynamic pose, high detail",
"Hatsune Miku, winter outfit, snowy background, detailed lighting",
"Hatsune Miku, close-up, expression smiling, soft lighting, anime style",
"Hatsune Miku, cyberpunk outfit, neon city background, cinematic lighting"
]

生成関数

def generate_novelai_image(prompt, index):
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
payload = {
"model": "nai-diffusion", # 生成モデル
"prompt": prompt,
"negativePrompt": "", # ネガティブプロンプトも指定可能
"steps": 28,
"width": 512,
"height": 512,
"cfgScale": 7.5
}
response = requests.post(API_URL, json=payload, headers=headers)
response.raise_for_status()
data = response.json()

# Base64画像を取得して保存
img_b64 = data["images"][0].split(",")[-1]
img_bytes = base64.b64decode(img_b64)
img_path = output_dir / f"miku_{index+1}.png"
with open(img_path, "wb") as f:
    f.write(img_bytes)
print(f"Saved: {img_path}")

ループで生成

for i, prompt in enumerate(prompts):
generate_novelai_image(prompt, i)

print("全ての画像生成完了！")

✅ スクリプトの特徴

複数プロンプトを一括生成

条件ごとに異なる衣装・背景・ポーズを生成可能

自動保存

novelai_miku ディレクトリに順番に保存

カスタマイズ可能

steps（生成ステップ数）、cfgScale（忠実度）、width/height（解像度）を調整可能

比較用に便利

生成結果をフォルダにまとめて、後でバリエーション比較や評価が可能

💡 実験応用

生成した画像を並べて「プロンプトの違いによる特徴再現率」を評価

背景・衣装・表情の条件を組み合わせて、理想的な初音ミクビジュアルを探索

Image-to-Image を組み合わせてリファインすることで、さらに精度向上

希望であれば、このスクリプトを Image-to-Imageでリファインするバージョンに拡張して、元画像のキャラクター保持＋背景変更も自動化できます。
作りますか？

あなた:
はい。お願いします。
ChatGPT:

わかりました。では、NovelAIで生成した初音ミクの画像を元に、Image-to-Imageでリファイン（衣装・背景変更やスタイル調整）を自動化するPythonスクリプトの例をご紹介します。

NovelAI Image-to-Image自動リファインスクリプト例
import requests
import base64
from pathlib import Path

保存先ディレクトリ

output_dir = Path("novelai_miku_refined")
output_dir.mkdir(exist_ok=True)

NovelAI API情報（非公式）※自分のAPIキーに置き換え

API_URL = "https://api.novelai.net/ai/generate-image"
API_KEY = "YOUR_NOVELAI_API_KEY"

元画像ディレクトリ（Text-to-Imageで生成した画像）

input_dir = Path("novelai_miku")
input_images = sorted(list(input_dir.glob("*.png")))

リファイン用プロンプト

refine_prompts = [
"Add snow background, winter outfit, highly detailed",
"Add stage lighting, dynamic pose, microphone, highly detailed",
"Close-up portrait, soft lighting, expression smiling",
"Futuristic neon background, cyberpunk outfit, cinematic lighting"
]

Image-to-Image生成関数

def refine_image(image_path, prompt, index):
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
# 元画像をBase64に変換
with open(image_path, "rb") as f:
img_b64 = base64.b64encode(f.read()).decode()

payload = {
    "model": "nai-diffusion",
    "prompt": prompt,
    "init_image": f"data:image/png;base64,{img_b64}",  # Image-to-Image用
    "strength": 0.7,   # 変更の強さ: 0.0=元を保持, 1.0=完全置換
    "steps": 28,
    "width": 512,
    "height": 512,
    "cfgScale": 7.5
}

response = requests.post(API_URL, json=payload, headers=headers)
response.raise_for_status()
data = response.json()

# Base64画像を保存
img_out_b64 = data["images"][0].split(",")[-1]
img_bytes = base64.b64decode(img_out_b64)
out_path = output_dir / f"refined_{index+1}.png"
with open(out_path, "wb") as f:
    f.write(img_bytes)
print(f"Saved refined image: {out_path}")

元画像×リファインプロンプトの組み合わせで生成

for i, img_path in enumerate(input_images):
prompt = refine_prompts[i % len(refine_prompts)] # プロンプトをループ
refine_image(img_path, prompt, i)

print("全ての画像リファイン完了！")

✅ スクリプトの特徴

Text-to-Imageで生成した画像を元にリファイン

元キャラクターの特徴を保持しつつ、衣装や背景、光源・スタイルを変更可能

自動化

元画像ディレクトリをループして一括処理

強度調整

strength パラメータでどの程度元画像を保持するかを制御

複数プロンプトの適用

バリエーション生成や条件比較に便利

💡 実験応用例

初音ミクの衣装・ポーズ・背景をさまざまに変えたバリエーション比較

複数条件での生成結果を評価して、最も自然なスタイルを探索

Text-to-Image → Image-to-Imageの連続生成で、キャラクター保持と自由度を両立

希望であれば、次に「この一連のText-to-Image＋Image-to-Image自動生成フローをGUI化して、ブラウザ上で簡単に操作できるWebアプリ化」の例も作れます。
作りますか？

あなた:
初音ミクの代わりに、涼宮ハルヒにするにはどうしたらいいですか。
ChatGPT:

なるほどです。初音ミクから涼宮ハルヒに置き換える場合、基本的には以下のポイントを押さえればOKです。NovelAIなどのテキスト生成型AIでも同じ考え方で調整できます。

プロンプトの置き換え

初音ミクの特徴を表していたキーワードを、涼宮ハルヒの特徴に置き換える

例：

元（初音ミク）置き換え（涼宮ハルヒ）
Hatsune Miku Haruhi Suzumiya
turquoise twin tails short brown hair
futuristic outfit North High School uniform (yellow ribbon)
holding microphone confident pose, school bag optional
anime style anime style, expressive facial expression

生成例（Text-to-Image用プロンプト）：

Haruhi Suzumiya, anime style, short brown hair, North High School uniform with yellow ribbon, dynamic confident pose, highly detailed, full body

Image-to-Imageでリファインする場合

元画像を涼宮ハルヒ風に変えたい場合は、元画像をアップロードして以下のプロンプトを使う：

Refine to Haruhi Suzumiya style, confident pose, North High School uniform, expressive facial expression, anime style, high detail

strength パラメータで元画像を保持する度合いを調整（0.5～0.7くらいが自然）

注意点

著作権・商用利用

涼宮ハルヒは商標・著作権があるので、生成画像の商用利用には注意

個人実験・学習・ファンアート用途に限定するのが安全

プロンプトの具体性

髪型、服装、表情、ポーズを明確に指定するとAIが認識しやすい

背景条件や小物も追加すると生成精度が向上

複数プロンプトで試す

Text-to-Image → Image-to-Image の流れでバリエーション比較

生成される涼宮ハルヒのイメージを最適化

Text-to-Image → Image-to-Image 自動化の変更点

先ほど提示したスクリプトの prompts リストを初音ミク用から涼宮ハルヒ用に置き換えるだけで同じフローで生成可能

例：

prompts = [
"Haruhi Suzumiya, anime style, short brown hair, North High School uniform",
"Haruhi Suzumiya, full body, confident pose, expressive facial expression",
"Haruhi Suzumiya, winter uniform, snowy background, detailed lighting",
"Haruhi Suzumiya, close-up, smiling expression, anime style",
"Haruhi Suzumiya, cyberpunk style, neon city background, cinematic lighting"
]

Image-to-Image用のリファインプロンプトも同様に置き換え

💡 ポイントまとめ：

キャラクター名を変更する

髪型・服装・表情・ポーズを具体的にプロンプトに反映

Text-to-Image → Image-to-Imageのフローはそのまま使える

複数条件で生成し、自然なハルヒ像を探索する

希望であれば、涼宮ハルヒ用に完全に置き換えた「Text-to-Image＋Image-to-Image自動生成フロー」スクリプト例も作れます。
作りますか？

あなた:
はい。お願いします。
ChatGPT:

わかりました。では、先ほどの初音ミク用スクリプトを涼宮ハルヒ用に完全置き換えたText-to-Image＋Image-to-Image自動生成フローを作ります。

NovelAI 涼宮ハルヒ生成・自動リファインスクリプト
import requests
import base64
from pathlib import Path

-----------------------------

ディレクトリ設定

-----------------------------

text2img_dir = Path("novelai_haruhi")
refine_dir = Path("novelai_haruhi_refined")
text2img_dir.mkdir(exist_ok=True)
refine_dir.mkdir(exist_ok=True)

-----------------------------

API情報（非公式API）

-----------------------------

API_URL = "https://api.novelai.net/ai/generate-image"
API_KEY = "YOUR_NOVELAI_API_KEY"

-----------------------------

Text-to-Image用プロンプト（ハルヒ用）

-----------------------------

text2img_prompts = [
"Haruhi Suzumiya, anime style, short brown hair, North High School uniform, dynamic confident pose, full body, highly detailed",
"Haruhi Suzumiya, full body, holding school bag, confident pose, expressive facial expression, vibrant colors",
"Haruhi Suzumiya, winter uniform, snowy background, detailed lighting, anime style",
"Haruhi Suzumiya, close-up, smiling expression, soft lighting, anime style",
"Haruhi Suzumiya, cyberpunk outfit, neon city background, cinematic lighting, anime style"
]

-----------------------------

Image-to-Image用リファインプロンプト

-----------------------------

refine_prompts = [
"Refine to Haruhi Suzumiya style, confident pose, North High School uniform, expressive facial expression, high detail, anime style",
"Refine to Haruhi Suzumiya, dynamic pose, holding school bag, full body, vibrant lighting",
"Refine to Haruhi Suzumiya, winter uniform, snowy background, detailed lighting",
"Refine to Haruhi Suzumiya, close-up portrait, soft lighting, smiling expression",
"Refine to Haruhi Suzumiya, cyberpunk outfit, neon city background, cinematic lighting"
]

-----------------------------

生成関数（Text-to-Image）

-----------------------------

def generate_text2img(prompt, index):
headers = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
payload = {
"model": "nai-diffusion",
"prompt": prompt,
"negativePrompt": "",
"steps": 28,
"width": 512,
"height": 512,
"cfgScale": 7.5
}
response = requests.post(API_URL, json=payload, headers=headers)
response.raise_for_status()
data = response.json()

img_b64 = data["images"][0].split(",")[-1]
img_bytes = base64.b64decode(img_b64)
img_path = text2img_dir / f"haruhi_{index+1}.png"
with open(img_path, "wb") as f:
    f.write(img_bytes)
print(f"Saved Text-to-Image: {img_path}")
return img_path

-----------------------------

生成関数（Image-to-Imageリファイン）

-----------------------------

def refine_image(image_path, prompt, index):
headers = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
with open(image_path, "rb") as f:
img_b64 = base64.b64encode(f.read()).decode()

payload = {
    "model": "nai-diffusion",
    "prompt": prompt,
    "init_image": f"data:image/png;base64,{img_b64}",
    "strength": 0.7,  # 元画像保持度
    "steps": 28,
    "width": 512,
    "height": 512,
    "cfgScale": 7.5
}

response = requests.post(API_URL, json=payload, headers=headers)
response.raise_for_status()
data = response.json()

img_out_b64 = data["images"][0].split(",")[-1]
img_bytes = base64.b64decode(img_out_b64)
out_path = refine_dir / f"refined_haruhi_{index+1}.png"
with open(out_path, "wb") as f:
    f.write(img_bytes)
print(f"Saved refined image: {out_path}")

-----------------------------

Text-to-Image生成ループ

-----------------------------

text2img_paths = []
for i, prompt in enumerate(text2img_prompts):
img_path = generate_text2img(prompt, i)
text2img_paths.append(img_path)

-----------------------------

Image-to-Imageリファインループ

-----------------------------

for i, img_path in enumerate(text2img_paths):
prompt = refine_prompts[i % len(refine_prompts)]
refine_image(img_path, prompt, i)

print("全てのハルヒ画像生成とリファイン完了！")

✅ スクリプトの特徴

キャラクター置換済み

Text-to-Image と Image-to-Image の両方で初音ミク→涼宮ハルヒに完全置換

自動バリエーション生成

5種類のプロンプトで Text-to-Image → それぞれリファイン

元画像保持度調整

strength=0.7 で元画像を保持しつつ、スタイルや背景を反映

出力整理

novelai_haruhi/ に元生成画像

novelai_haruhi_refined/ にリファイン済み画像

💡 応用ポイント

複数背景や衣装の条件を追加して、さらに多くのバリエーションを自動生成可能

スクリプトをループ化すれば、一括で数十～百パターンの生成も簡単

GUI化（Gradioなど）すれば、ブラウザ上で操作可能

希望であれば、このハルヒ自動生成フローをGUI化して、Text-to-Image＋Image-to-Imageをブラウザ上で操作できるアプリ化も作れます。
作りますか？

あなた:
はい。お願いします。
ChatGPT:

了解です。それでは、先ほどの涼宮ハルヒ自動生成フロー（Text-to-Image + Image-to-Image）をブラウザで操作できるGUI化した例を作ります。ここでは Gradio を使用します。

NovelAI 涼宮ハルヒ生成GUI（Text-to-Image + Image-to-Image）
import gradio as gr
import requests
import base64
from pathlib import Path

-----------------------------

ディレクトリ設定

-----------------------------

text2img_dir = Path("novelai_haruhi_gui")
refine_dir = Path("novelai_haruhi_gui_refined")
text2img_dir.mkdir(exist_ok=True)
refine_dir.mkdir(exist_ok=True)

-----------------------------

API情報（非公式API）

-----------------------------

API_URL = "https://api.novelai.net/ai/generate-image"
API_KEY = "YOUR_NOVELAI_API_KEY"

-----------------------------

Text-to-Image生成関数

-----------------------------

def generate_text2img(prompt):
headers = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
payload = {
"model": "nai-diffusion",
"prompt": prompt,
"negativePrompt": "",
"steps": 28,
"width": 512,
"height": 512,
"cfgScale": 7.5
}
response = requests.post(API_URL, json=payload, headers=headers)
response.raise_for_status()
data = response.json()

img_b64 = data["images"][0].split(",")[-1]
img_bytes = base64.b64decode(img_b64)

img_path = text2img_dir / "haruhi_t2i.png"
with open(img_path, "wb") as f:
    f.write(img_bytes)

return img_path

-----------------------------

Image-to-Imageリファイン関数

-----------------------------

def refine_image(init_image_path, prompt, strength=0.7):
headers = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
with open(init_image_path, "rb") as f:
img_b64 = base64.b64encode(f.read()).decode()

payload = {
    "model": "nai-diffusion",
    "prompt": prompt,
    "init_image": f"data:image/png;base64,{img_b64}",
    "strength": strength,
    "steps": 28,
    "width": 512,
    "height": 512,
    "cfgScale": 7.5
}

response = requests.post(API_URL, json=payload, headers=headers)
response.raise_for_status()
data = response.json()

img_out_b64 = data["images"][0].split(",")[-1]
img_bytes = base64.b64decode(img_out_b64)
out_path = refine_dir / "haruhi_refined.png"
with open(out_path, "wb") as f:
    f.write(img_bytes)

return out_path

-----------------------------

GUI操作用関数

-----------------------------

def generate_and_refine(text2img_prompt, refine_prompt, strength=0.7):
t2i_path = generate_text2img(text2img_prompt)
refined_path = refine_image(t2i_path, refine_prompt, strength)
return t2i_path, refined_path

-----------------------------

Gradio GUI定義

-----------------------------

iface = gr.Interface(
fn=generate_and_refine,
inputs=[
gr.Textbox(lines=2, label="Text-to-Image Prompt", value="Haruhi Suzumiya, anime style, short brown hair, North High School uniform, dynamic confident pose"),
gr.Textbox(lines=2, label="Image-to-Image Refine Prompt", value="Refine to Haruhi Suzumiya style, confident pose, high detail, anime style"),
gr.Slider(0.0, 1.0, value=0.7, step=0.05, label="Refine Strength")
],
outputs=[
gr.Image(type="filepath", label="Text-to-Image Output"),
gr.Image(type="filepath", label="Image-to-Image Refined Output")
],
title="NovelAI Haruhi Generator",
description="Text-to-Imageで涼宮ハルヒを生成し、Image-to-ImageでリファインするGUIアプリ"
)

iface.launch()

✅ GUIの特徴

Text-to-Image と Image-to-Image を一度に操作可能

左側に生成したいキャラクターのプロンプトを入力

リファインプロンプトと強度を調整して右側で確認

Refine Strength

0.0 → 元画像完全保持

1.0 → 完全置換

生成結果をファイルで保存

novelai_haruhi_gui/ → Text-to-Image生成画像

novelai_haruhi_gui_refined/ → Refine後画像

ブラウザ上で直感的に操作可能

💡 応用ポイント

複数プロンプトを保存して切り替えれば、いろんな衣装・背景・ポーズのハルヒ画像を簡単に生成可能

GUI上で強度を調整しながら、元画像保持とスタイル変換のバランスを調整できる

将来的に ControlNet を組み合わせてポーズ保持や背景差分生成も統合可能

必要であれば、次のステップとして ControlNetによるポーズ保持＋背景差分生成を組み込んだ「ハルヒ完全カスタマイズWebアプリ」の例も作れます。
作りますか？

あなた:
はい。お願いします。
ChatGPT:

了解です。それでは、NovelAI + ControlNet を組み合わせて、涼宮ハルヒの元イラストを保持しつつポーズや衣装を変更し、背景も自由に差し替えられるWebアプリのサンプルを作ります。ここでは Gradio GUI を使います。

NovelAI + ControlNet ハルヒカスタマイズWebアプリ例
import gradio as gr
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, StableDiffusionInpaintPipeline
from PIL import Image
import cv2
import torch
import numpy as np
import base64
import requests
from pathlib import Path

-----------------------------

ディレクトリ

-----------------------------

output_dir = Path("haruhi_custom")
output_dir.mkdir(exist_ok=True)

-----------------------------

NovelAI API設定

-----------------------------

API_URL = "https://api.novelai.net/ai/generate-image"
API_KEY = "YOUR_NOVELAI_API_KEY"

def generate_novelai_image(prompt):
headers = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
payload = {
"model": "nai-diffusion",
"prompt": prompt,
"negativePrompt": "",
"steps": 28,
"width": 512,
"height": 512,
"cfgScale": 7.5
}
response = requests.post(API_URL, json=payload, headers=headers)
response.raise_for_status()
data = response.json()
img_b64 = data["images"][0].split(",")[-1]
img_bytes = base64.b64decode(img_b64)
img_path = output_dir / "haruhi_novelai.png"
with open(img_path, "wb") as f:
f.write(img_bytes)
return img_path

-----------------------------

ControlNet準備

-----------------------------

Canny + Depth + Segmentation ControlNetを使用

control_canny = ControlNetModel.from_pretrained(
"lllyasviel/sd-controlnet-canny",
torch_dtype=torch.float16
)
control_depth = ControlNetModel.from_pretrained(
"lllyasviel/sd-controlnet-depth",
torch_dtype=torch.float16
)
control_seg = ControlNetModel.from_pretrained(
"lllyasviel/sd-controlnet-seg",
torch_dtype=torch.float16
)

pipe_control = StableDiffusionControlNetPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
controlnet=[control_canny, control_depth, control_seg],
torch_dtype=torch.float16
).to("cuda")

pipe_inpaint = StableDiffusionInpaintPipeline.from_pretrained(
"runwayml/stable-diffusion-inpainting",
torch_dtype=torch.float16
).to("cuda")

-----------------------------

生成関数

-----------------------------

def generate_haruhi(prompt_character, prompt_background, mask_image=None):
# 1. NovelAIで初期キャラクター生成
novelai_path = generate_novelai_image(prompt_character)
init_image = Image.open(novelai_path).convert("RGB")

# 2. ControlNet用画像作成
cv_img = np.array(init_image)
canny_img = Image.fromarray(cv2.Canny(cv_img, 100, 200))
depth_img = Image.fromarray(cv2.cvtColor(cv_img, cv2.COLOR_BGR2GRAY))
gray = cv2.cvtColor(cv_img, cv2.COLOR_BGR2GRAY)
_, seg_img = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY)
seg_img = Image.fromarray(seg_img)

# ControlNetで衣装・ポーズ反映
control_output = pipe_control(
    prompt=prompt_character,
    image=[canny_img, depth_img, seg_img],
    guidance_scale=7.5
).images[0]

# Inpaintingで背景差し替え
if mask_image:
    mask_image = mask_image.convert("L")
    final_output = pipe_inpaint(
        prompt=prompt_background,
        image=control_output,
        mask_image=mask_image,
        guidance_scale=7.5
    ).images[0]
else:
    final_output = control_output

# 保存
final_path = output_dir / "haruhi_final.png"
final_output.save(final_path)

return final_path

-----------------------------

Gradio GUI

-----------------------------

iface = gr.Interface(
fn=generate_haruhi,
inputs=[
gr.Textbox(lines=2, label="Character Prompt", value="Haruhi Suzumiya, anime style, short brown hair, North High School uniform, confident pose"),
gr.Textbox(lines=2, label="Background Prompt", value="Sunny schoolyard with blue sky and clouds"),
gr.Image(type="pil", label="Mask Image (optional, for background inpainting)")
],
outputs=gr.Image(type="filepath", label="Generated Image"),
title="Haruhi Customizer (NovelAI + ControlNet + Inpainting)",
description="NovelAIでハルヒキャラクターを生成し、ControlNetでポーズ・衣装保持、Inpaintingで背景変更可能なWebアプリ"
)

iface.launch()

✅ 特徴

NovelAIで初期ハルヒ生成

キャラクターの表情・髪型・衣装の基本形を生成

ControlNetでポーズ・衣装保持

元画像のキャラクター構造を反映

Inpaintingで背景差し替え

マスク画像を使えば自由に背景変更可能

Webブラウザ操作

プロンプト入力・マスクアップロード・生成をGUIで簡単操作

💡 応用ポイント

複数ControlNetを組み合わせてポーズや衣装の複雑な変更にも対応

Mask画像で背景だけ置換可能

強度やガイダンススケールを調整して、元キャラクターの保持度を制御

希望であれば、このGUIにさらに「複数背景・衣装バリエーション自動生成機能」を追加して、一括で何十枚も生成できる機能を統合したバージョンも作れます。
作りますか？

あなた:
はい。お願いします。
ChatGPT:

了解です。それでは、先ほどの NovelAI + ControlNet + Inpainting GUI に複数背景・衣装バリエーションの自動生成機能を統合したバージョンを作ります。

NovelAI + ControlNet ハルヒ自動バリエーション生成Webアプリ
import gradio as gr
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, StableDiffusionInpaintPipeline
from PIL import Image
import cv2
import torch
import numpy as np
import base64
import requests
from pathlib import Path

-----------------------------

ディレクトリ設定

-----------------------------

output_dir = Path("haruhi_variations")
output_dir.mkdir(exist_ok=True)

-----------------------------

NovelAI API設定

-----------------------------

API_URL = "https://api.novelai.net/ai/generate-image"
API_KEY = "YOUR_NOVELAI_API_KEY"

-----------------------------

ControlNet準備

-----------------------------

control_canny = ControlNetModel.from_pretrained(
"lllyasviel/sd-controlnet-canny",
torch_dtype=torch.float16
)
pipe_control = StableDiffusionControlNetPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
controlnet=[control_canny],
torch_dtype=torch.float16
).to("cuda")

pipe_inpaint = StableDiffusionInpaintPipeline.from_pretrained(
"runwayml/stable-diffusion-inpainting",
torch_dtype=torch.float16
).to("cuda")

-----------------------------

NovelAI Text-to-Image生成

-----------------------------

def generate_novelai_image(prompt, index):
headers = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
payload = {
"model": "nai-diffusion",
"prompt": prompt,
"negativePrompt": "",
"steps": 28,
"width": 512,
"height": 512,
"cfgScale": 7.5
}
response = requests.post(API_URL, json=payload, headers=headers)
response.raise_for_status()
data = response.json()
img_b64 = data["images"][0].split(",")[-1]
img_bytes = base64.b64decode(img_b64)
img_path = output_dir / f"haruhi_t2i_{index+1}.png"
with open(img_path, "wb") as f:
f.write(img_bytes)
return img_path

-----------------------------

ControlNet + Inpainting自動リファイン

-----------------------------

def refine_with_variations(init_image_path, prompts_variations, mask_image=None):
refined_paths = []
init_image = Image.open(init_image_path).convert("RGB")
cv_img = np.array(init_image)
canny_img = Image.fromarray(cv2.Canny(cv_img, 100, 200))

for i, prompt in enumerate(prompts_variations):
    # ControlNetで衣装/ポーズ保持
    control_output = pipe_control(
        prompt=prompt,
        image=[canny_img],
        guidance_scale=7.5
    ).images[0]
    
    # Inpaintingで背景差し替え
    if mask_image:
        mask_image_gray = mask_image.convert("L")
        final_output = pipe_inpaint(
            prompt=prompt,
            image=control_output,
            mask_image=mask_image_gray,
            guidance_scale=7.5
        ).images[0]
    else:
        final_output = control_output
    
    out_path = output_dir / f"haruhi_refined_var{i+1}.png"
    final_output.save(out_path)
    refined_paths.append(out_path)
return refined_paths

-----------------------------

GUI関数

-----------------------------

def generate_variations(character_prompt, variation_prompts, mask_image=None):
# variation_promptsは改行区切りで複数入力
prompts_list = [p.strip() for p in variation_prompts.strip().split("\n") if p.strip()]
if not prompts_list:
return None

# 1. NovelAIで初期生成
init_path = generate_novelai_image(character_prompt, 0)

# 2. ControlNet + Inpaintingで自動バリエーション
refined_paths = refine_with_variations(init_path, prompts_list, mask_image)
return [str(init_path)] + [str(p) for p in refined_paths]

-----------------------------

Gradio GUI

-----------------------------

iface = gr.Interface(
fn=generate_variations,
inputs=[
gr.Textbox(lines=2, label="Character Prompt", value="Haruhi Suzumiya, anime style, short brown hair, North High School uniform, confident pose"),
gr.Textbox(lines=5, label="Variation Prompts (one per line)", value="Sunny schoolyard\nClassroom background\nNight city neon"),
gr.Image(type="pil", label="Mask Image (optional, for background inpainting)")
],
outputs=gr.Gallery(label="Generated Images", elem_id="generated_gallery").style(grid=[3], height="auto"),
title="Haruhi Multi-Variation Generator",
description="NovelAIで生成したハルヒキャラクターを元に、複数の背景・衣装バリエーションをControlNet + Inpaintingで自動生成するWebアプリ"
)

iface.launch()

✅ 特徴

複数背景・衣装バリエーション自動生成

Variation Prompts に改行で複数入力可能

初期キャラクターを基に自動で複数画像を生成

ControlNetでポーズ・衣装保持

元キャラクターの構造を保持しつつ、衣装やポーズの微調整が可能

Inpaintingで背景差し替え

マスクを使って、背景だけ変更可能

Webブラウザ操作

初期キャラクター生成とバリエーション生成を一度に操作

Galleryでまとめて確認できる

💡 応用ポイント

Variation Prompts を工夫すれば、一括で数十枚の衣装・背景バリエーション生成が可能

Mask画像を使えば、背景だけを差し替えてキャラクターは保持

ControlNetの種類（Depth、Canny、Poseなど）を追加すると、より自由度の高いキャラクター変形が可能

必要であれば、さらに GUIに「ポーズ画像アップロード＋ControlNet Pose保持」機能を追加して、ユーザーが任意のポーズにハルヒを自動配置する」機能付きWebアプリのサンプルも作れます。
作りますか？

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up