ひとりで完走_C# is GODAdvent Calendar 2024

C#でStableDiffusion　AI画像生成をやってみた

Last updated at 2024-12-12Posted at 2024-12-12

StableDiffusionをC#で

今回もAIシリーズになります。
StableDiffusionをC#で実装したのでその紹介を

参考にした記事

StableDiffusionはOnnxでも配布しているので簡単にC#で扱えるようになっています。
参考にした記事はこちらです。

モデルのダウンロード

例のごとくHuggingFaceからモデルをダウンロードします。

ブランチがONNX形式で分かれているので、からなずONNXのものをダウンロードしてください。

今回はSDModelというフォルダをAssets以下に作成して以下のモデルを入れています。

text_encoder
unet
vae_encoder
vae_decoder

またこのほかStableDiffusionファイルを作りcliptokenizer.onnxとortextensions.dllを入れています。
ortextensions.dllは上記のgitHubからDirect-ML-EPのブランチをgitCloneして手に入れました。

AI画像生成サービスを作る

画像生成を行うサービスを作ります。

class StableDiffusion
{
    // UI側からプロンプトやAI画像生成の設定、使用するモデルのパスなどを渡す
    public string OnGenerateT2I(string positivePrompt, string negativePrompt, int numSteps, double guidanceScale,int seed, int height, int width, string modelPath, string outputPath, IProgress<int> iprogress)
    {
        // configにAI画像生成に必要な情報を詰める
        var config = new StableDiffusionConfig()
        {
            NumInferenceSteps = numSteps,//どれだけ推論するか
            GuidanceScale = guidanceScale,//どれだけ与えた情報の影響を強くするか
            //今回はDirectMLを使うのでDirectMLを設定。CUDAめんどくせ
            ExecutionProviderTarget = StableDiffusionConfig.ExecutionProvider.DirectML, 
            //画像の解像度たち(8の倍数のものにしないとエラーになります)
            Height = height,
            Width = width,
            //デバイスIDはDirectMLは0でよかったはず
            DeviceId = 0,
            //gitCloneして取得した`ortextensions.dll`のパス
            OrtExtensionsPath = Windows.ApplicationModel.Package.Current.InstalledPath + @"\Assets\StableDiffusion\ortextensions.dll",
            //ハギングフェイスから取得した取得した`cliptokenizer.onnx`のパス
            TokenizerOnnxPath = Windows.ApplicationModel.Package.Current.InstalledPath + @"\Assets\StableDiffusion\cliptokenizer.onnx",
            //その他Asettsに保存したonnxモデルのパス
            TextEncoderOnnxPath = modelPath + @"\text_encoder\model.onnx",
            UnetOnnxPath = modelPath + @"\unet\model.onnx",
            VaeDecoderOnnxPath = modelPath + @"\vae_decoder\model.onnx",
            //アウトプット先のパス
            ImageOutputPath = outputPath
        };
        // Unet.Inferenceで画像を作成し作成した画像を返す
        var image = UNet.Inference(positivePrompt, negativePrompt, config, seed, iprogress);
        return image;
    }
}

あとの処理はGitHubや上記の記事でほぼ変わりありません。
一番重要なのはUNet.csです
ここで画像作成に必要なことすべてをやっています。
すべてを説明はしません

public static string Inference(string positivePrompt, string negativePrompt, StableDiffusionConfig config, int seed, IProgress<int> progress) 
{
    progress.Report(1);
    // Preprocess text
    var textEmbeddings = TextProcessing.PreprocessText(positivePrompt, negativePrompt, config);

    var scheduler = new LMSDiscreteScheduler();
    var timesteps = scheduler.SetTimesteps(config.NumInferenceSteps);

    // create latent tensor
    var latents = GenerateLatentSample(config, seed, scheduler.InitNoiseSigma);
    var sessionOptions = config.GetSessionOptionsForEp();
    // Create Inference Session
    var unetSession = new InferenceSession(config.UnetOnnxPath, sessionOptions);
    var input = new List<NamedOnnxValue>();
    for (int t = 0; t < timesteps.Length; t++)
    {
        progress.Report(t+1);
        var latentModelInput = TensorHelper.Duplicate(latents.ToArray(), new[] { 2, 4, config.Height / 8, config.Width / 8 });

        // latent_model_input = scheduler.scale_model_input(latent_model_input, timestep = t)
        latentModelInput = scheduler.ScaleInput(latentModelInput, timesteps[t]);

        input = CreateUnetModelInput(textEmbeddings, latentModelInput, timesteps[t]);

        // Run Inference
        var output = unetSession.Run(input);
        var outputTensor = (output.ToList().First().Value as DenseTensor<float>);

        // Split tensors from 2,4,64,64 to 1,4,64,64
        var splitTensors = TensorHelper.SplitTensor(outputTensor, new[] { 1, 4, config.Height / 8, config.Width / 8 });
        var noisePred = splitTensors.Item1;
        var noisePredText = splitTensors.Item2;

        // Perform guidance
        noisePred = performGuidance(noisePred, noisePredText, config.GuidanceScale);

        // LMS Scheduler Step
        latents = scheduler.Step(noisePred, timesteps[t], latents);
    }
    // Scale and decode the image latents with vae.
    latents = TensorHelper.MultipleTensorByFloat(latents.ToArray(), (1.0f / 0.18215f), latents.Dimensions.ToArray());
    var decoderInput = new List<NamedOnnxValue> { NamedOnnxValue.CreateFromTensor("latent_sample", latents) };

    // Decode image
    var imageResultTensor = VaeDecoder.Decoder(decoderInput, config.VaeDecoderOnnxPath);

    var image = VaeDecoder.ConvertToImage(imageResultTensor, config);
    progress.Report(config.NumInferenceSteps+1);
    return image;
}

上記の処理を行うことによって簡単に画像が生成できるようになります。

作成物

まだこれも未完成ですが、下の動画のように、AIモデルを変えるとその画風で画像が出力されるようなものを作りました。

ライブラリについて

C#でStableDiffusionを使うにあたり、強力なライブラリが出ていることを知りました。
自分は使ったことないので使用感は言えませんが、ControlNetも使えるようです。
皆さんも試してみてください

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

C#でStableDiffusion AI画像生成をやってみた