Stable Diffusion (diffusers + transformers package)

Today I wanted to share an early preview of some work I’ve done to run Stable Diffusion models using Sentis:

com.doji.diffusers
com.doji.transformers

I ported the ClipTokenizer from huggingface/transformers and the PNDMScheduler and the denoising loop from huggingface/diffusers.

There is still lots to do, the quality is not really there yet. I think the most important thing would be to get Classifier-Free Guidance to work which I haven’t managed to do yet.
Update: Classifier-Free Guidance works now

I know other people have already shown their success with running diffusion models with Sentis on here, so maybe someone can give some advice or would like to contribute.

The GIF below is with the Stable Diffusion 1.5 model and 15 steps at 512x512 on an RTX 3060.

Diffusers-StableDiffusionPipelineSample-WindowsMacLinux-Unity2022.3.7f1__DX11_2024-01-3000-31-54-ezgif.com-optimize

3 Likes

Update: Got Classifier-Free Guidance working. Much better results now.
Also takes a bit longer to generate though…

3 Likes

This is very interesting, thanks for sharing!

1 Like

Updated the package to include a new scheduler which makes Stable Diffusion 2.1 work:

3 Likes

Do you think something like this could run on an iphone? Or are the memory/processing requirements too great?

Not anytime soon I think. Model sizes range from around 5 to 12 GB for the FP32 versions (quantization will help a bit once it lands in Sentis), but then raw processing power will be a bottleneck for mobile form factors for years to come.

Technically possible of course, some people are running these on a Raspberry PI even: GitHub - vitoplantamura/OnnxStream: Lightweight inference library for ONNX files, written in C++. It can run SDXL on a RPI Zero 2 but also Mistral 7B on desktops and servers. (with inference times of several hours per image :smile:)

just released an update with support for Stable Diffusion XL, as well as turbo models for SD 1.5 and SDXL.
The images below were created with SDXL Turbo at 512x512 with 3 steps in ~6 seconds per image.


Also added img2img support.


(left is input, right is generated output)

4 Likes
ArgumentOutOfRangeException: Specified argument was out of the range of valid values.
Parameter name: count
System.Linq.Enumerable.Range (System.Int32 start, System.Int32 count) (at <93223d662c2546d4b5d1784737504095>:0)
Doji.AI.Transformers.ClipTokenizer.GetRange (System.Int32 start, System.Int32 end) (at Library/PackageCache/com.doji.transformers@0.0.1/Runtime/Scripts/Clip/ClipTokenizer.cs:160)
Doji.AI.Transformers.ClipTokenizer.BytesToUnicode () (at Library/PackageCache/com.doji.transformers@0.0.1/Runtime/Scripts/Clip/ClipTokenizer.cs:135)
Doji.AI.Transformers.ClipTokenizer.Initialize (Doji.AI.Transformers.Vocab vocab, System.String merges, Doji.AI.Transformers.TokenizerConfig config, System.Collections.Generic.Dictionary`2[TKey,TValue] addedTokensDecoder, System.Int32 modelMaxLength, Doji.AI.Transformers.Side paddingSide, Doji.AI.Transformers.Side truncationSide, System.Collections.Generic.List`1[T] modelInputNames, System.Boolean cleanUpTokenizationSpaces, System.Boolean splitSpecialTokens) (at Library/PackageCache/com.doji.transformers@0.0.1/Runtime/Scripts/Clip/ClipTokenizer.cs:58)
Doji.AI.Transformers.ClipTokenizer..ctor (Doji.AI.Transformers.Vocab vocab, System.String merges, Doji.AI.Transformers.TokenizerConfig config) (at Library/PackageCache/com.doji.transformers@0.0.1/Runtime/Scripts/Clip/ClipTokenizer.cs:34)
Doji.AI.Diffusers.StableDiffusionXLPipeline.FromPretrained (Doji.AI.Diffusers.DiffusionModel model, Unity.Sentis.BackendType backend) (at Library/PackageCache/com.doji.diffusers@0.3.1/Runtime/Scripts/Pipelines/PipelineUtils.cs:369)
Doji.AI.Diffusers.DiffusionPipeline.FromPretrained (Doji.AI.Diffusers.DiffusionModel model, Unity.Sentis.BackendType backend) (at Library/PackageCache/com.doji.diffusers@0.3.1/Runtime/Scripts/Pipelines/PipelineUtils.cs:226)
Doji.AI.Diffusers.StableDiffusion.Initialize () (at Library/PackageCache/com.doji.diffusers@0.3.1/Runtime/Scripts/Core/StableDiffusion.cs:67)
Doji.AI.Diffusers.StableDiffusion..ctor (Doji.AI.Diffusers.DiffusionModel model) (at Library/PackageCache/com.doji.diffusers@0.3.1/Runtime/Scripts/Core/StableDiffusion.cs:52)
Doji.AI.Diffusers.Samples.StableDiffusionUI.Start () (at Assets/Samples/Diffusers/0.3.1/Basic Stable Diffusion Sample/StableDiffusionUI.cs:26)

I’m getting this error when trying to launch the demo scene with SDXL Turbo installed and selected. What could be the issue?

1 Like

Hey, thanks for catching this. Never had the clip tokenizer fail to initialize before.

Which way did you download/install the package? Could be a snafu with Git where the file is downloaded in a format that doesn’t support some of the special characters used here.
More specifically these are the characters that seem to produce an unexpected negative value on your system:

List<int> bs = GetRange('!', '~' + 1)         // !:35, ~:126
              .Concat(GetRange('¡', '¬' + 1)) // ¡:161, ¬:172
              .Concat(GetRange('®', 'ÿ' + 1)) // ®:174, ÿ:255
              .ToList();

You might want to check if that code file looks the same on your system and if it does: which one of these characters returns a negative value?

Alternatively: It’s also possible that the code could be buggy on specific region/culture settings that I haven’t tested. You could try to set your region to US or something temporarily, but I’ll investigate more, maybe I can reproduce the issue.

As a more useful short-term workaround: You could just replace the start and end indices with the values in the comment, which is likely what I’ll end up changing the code to.

EDIT: One more correction, the comments have an error too, it should be 33 not 35 for the first index, so:

List<int> bs = GetRange(33, 127)
              .Concat(GetRange(161, 173))
              .Concat(GetRange(174, 256))
              .ToList();

Will fix as soon as possible.

1 Like

I’ve installed the package through OpenUPM. Indeed, the code snippet looks different for me:

            List<int> bs = GetRange('!', '~' + 1) //!:35, ~:126
                .Concat(GetRange('Ў', '¬' + 1)) // Ў:161, ¬:172
                .Concat(GetRange('®', 'я' + 1)) // ®:174, я:255
                .ToList();

Thank you for help, this issue is gone

1 Like

Hello, just found this topic and the package looks very promising.

Unfortunately, I get these errors right after the image generation in demo scene:

Exceeded safe compute dispatch group count limit per dimension [131072, 1, 1] for Transpose2D
UnityEngine.Debug:LogWarning (object)
Unity.Sentis.D:LogWarning (object) (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Internals/Debug.cs:72)
Unity.Sentis.ComputeHelper:Dispatch (Unity.Sentis.ComputeFunc,int,int,int) (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Backends/GPUCompute/ComputeHelper.cs:193)
Unity.Sentis.GPUComputeBackend:Transpose (Unity.Sentis.Tensor,Unity.Sentis.Tensor,int[]) (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Backends/GPUCompute/GPUCompute.cs:1914)
Unity.Sentis.Layers.Transpose:Execute (Unity.Sentis.Tensor[],Unity.Sentis.ExecutionContext) (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Layers/Layer.Transformation.cs:1365)
Unity.Sentis.GenericWorker/<StartManualSchedule>d__33:MoveNext () (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Backends/GenericWorker.cs:243)
Unity.Sentis.GenericWorker:Execute () (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Backends/GenericWorker.cs:176)
Unity.Sentis.GenericWorker:Execute (Unity.Sentis.Tensor) (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Backends/GenericWorker.cs:168)
Doji.AI.Diffusers.VaeDecoder:Execute (Unity.Sentis.TensorFloat) (at Assets/Plugins/com.doji.diffusers/Runtime/Scripts/Models/VaeDecoder.cs:49)
Doji.AI.Diffusers.StableDiffusionPipeline:Generate (Doji.AI.Diffusers.Parameters) (at Assets/Plugins/com.doji.diffusers/Runtime/Scripts/Pipelines/Stable Diffusion/StableDiffusionPipeline.cs:154)
Doji.AI.Diffusers.DiffusionPipeline:Generate (string,System.Nullable`1<int>,System.Nullable`1<int>,System.Nullable`1<int>,System.Nullable`1<single>,string,System.Nullable`1<uint>) (at Assets/Plugins/com.doji.diffusers/Runtime/Scripts/Pipelines/DiffusionPipeline.cs:158)
Doji.AI.Diffusers.StableDiffusion:Imagine (string,int,int,int,single,string) (at Assets/Plugins/com.doji.diffusers/Runtime/Scripts/Core/StableDiffusion.cs:81)
Doji.AI.Diffusers.Samples.StableDiffusionUI:Txt2Img (string) (at Assets/Scenes/01-StableDiffusionPipelineSample/StableDiffusionUI.cs:46)
Doji.AI.Diffusers.Samples.StableDiffusionTxt2ImgUI/<OnGenerateClicked>d__8:MoveNext () (at Assets/Scenes/01-StableDiffusionPipelineSample/StableDiffusionTxt2ImgUI.cs:22)
System.Runtime.CompilerServices.AsyncVoidMethodBuilder:Start<Doji.AI.Diffusers.Samples.StableDiffusionTxt2ImgUI/<OnGenerateClicked>d__8> (Doji.AI.Diffusers.Samples.StableDiffusionTxt2ImgUI/<OnGenerateClicked>d__8&)
Doji.AI.Diffusers.Samples.StableDiffusionTxt2ImgUI:OnGenerateClicked ()
UnityEngine.EventSystems.EventSystem:Update () (at ./Library/PackageCache/com.unity.ugui@2.0.0/Runtime/UGUI/EventSystem/EventSystem.cs:530)

Thread group count is above the maximum allowed limit. Maximum allowed thread group count is 65535.
UnityEngine.ComputeShader:Dispatch (int,int,int,int)
Unity.Sentis.ComputeHelper:Dispatch (Unity.Sentis.ComputeFunc,int,int,int) (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Backends/GPUCompute/ComputeHelper.cs:195)
Unity.Sentis.GPUComputeBackend:Transpose (Unity.Sentis.Tensor,Unity.Sentis.Tensor,int[]) (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Backends/GPUCompute/GPUCompute.cs:1914)
Unity.Sentis.Layers.Transpose:Execute (Unity.Sentis.Tensor[],Unity.Sentis.ExecutionContext) (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Layers/Layer.Transformation.cs:1365)
Unity.Sentis.GenericWorker/<StartManualSchedule>d__33:MoveNext () (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Backends/GenericWorker.cs:243)
Unity.Sentis.GenericWorker:Execute () (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Backends/GenericWorker.cs:176)
Unity.Sentis.GenericWorker:Execute (Unity.Sentis.Tensor) (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Backends/GenericWorker.cs:168)
Doji.AI.Diffusers.VaeDecoder:Execute (Unity.Sentis.TensorFloat) (at Assets/Plugins/com.doji.diffusers/Runtime/Scripts/Models/VaeDecoder.cs:49)
Doji.AI.Diffusers.StableDiffusionPipeline:Generate (Doji.AI.Diffusers.Parameters) (at Assets/Plugins/com.doji.diffusers/Runtime/Scripts/Pipelines/Stable Diffusion/StableDiffusionPipeline.cs:154)
Doji.AI.Diffusers.DiffusionPipeline:Generate (string,System.Nullable`1<int>,System.Nullable`1<int>,System.Nullable`1<int>,System.Nullable`1<single>,string,System.Nullable`1<uint>) (at Assets/Plugins/com.doji.diffusers/Runtime/Scripts/Pipelines/DiffusionPipeline.cs:158)
Doji.AI.Diffusers.StableDiffusion:Imagine (string,int,int,int,single,string) (at Assets/Plugins/com.doji.diffusers/Runtime/Scripts/Core/StableDiffusion.cs:81)
Doji.AI.Diffusers.Samples.StableDiffusionUI:Txt2Img (string) (at Assets/Scenes/01-StableDiffusionPipelineSample/StableDiffusionUI.cs:46)
Doji.AI.Diffusers.Samples.StableDiffusionTxt2ImgUI/<OnGenerateClicked>d__8:MoveNext () (at Assets/Scenes/01-StableDiffusionPipelineSample/StableDiffusionTxt2ImgUI.cs:22)
System.Runtime.CompilerServices.AsyncVoidMethodBuilder:Start<Doji.AI.Diffusers.Samples.StableDiffusionTxt2ImgUI/<OnGenerateClicked>d__8> (Doji.AI.Diffusers.Samples.StableDiffusionTxt2ImgUI/<OnGenerateClicked>d__8&)
Doji.AI.Diffusers.Samples.StableDiffusionTxt2ImgUI:OnGenerateClicked ()
UnityEngine.EventSystems.EventSystem:Update ()

I would be grateful for any help!

1 Like

There is some weirdness with that “Thread group count” error. At least on my system (32GB RAM, RTX 3060), it doesn’t actually cause any problems with the generation, i.e. it produces the same output as the official diffusers pipelines. (“Error Pause” is disabled in the Unity Console.)

However, I have other people also reporting that it doesn’t generate a valid image for them at 512x512.

I think these issues will be fixed with an update to Sentis 1.4, but this is not yet supported by the package, so I’m afraid there isn’t any good workaround right now (other than decreasing resolution which yields subpar results, depending on the model).

1 Like

I see, thank you.

1 Like

Thanks for the packages Julien, great work! I tried to install the diffusers and midas ones yesterday to re-create something like x.com but unfortunately I get the following compile errors after installing the packages:

Any idea what’s going on here or how to resolve?

Thank you

1 Like

Thanks for reporting!

It looks like the project is using a newer version of Sentis than what is currently supported by the diffusers package.
diffusers@0.3.1 (latest version) only supports Sentis 1.3.0-pre.3 at the moment.

May I ask how you imported the package?

You could either try:

  • manually downgrading Sentis to version 1.3.0-pre.3 via the Package Manager
  • import diffusers as an OpenUPM package which should import all dependencies with the correct versions
Import via OpenUPM

In Edit -> Project Settings -> Package Manager
add a new scoped registry:

Name: Doji
URL: https://package.openupm.com
Scope(s): com.doji

In the Package Manager install com.doji.diffusers either by name or select it in the list under Package Manager -> My Registries

I’m currently working on an update to support the newest Sentis version 2.0.0 but no ETA yet.

Oh I see, thanks Julien! I can give a shot to #1

I installed the packages following these instructions, which were found on the openupm.com page for the diffusers package > Manual installation. Seems similar to what you’re describing in #2.

1 Like

Oh… I think I see the problem. The midas package does already support a newer version (1.6.0-pre.1) which causes the Package Manager to resolve to this version of Sentis.
I think the easiest way to fix would be to open Packages/manifest.json and downgrade midas to
"com.doji.midas": "1.0.1"

Alternatively, clone the test project I uploaded which already has the dependencies set up:

I see, makes sense!

Thanks for uploading the test project, very useful! I’m able to run the depth estimation with it, but run into the following error when instantiating the _sd object in WorldGeneration.cs:

_sd = DiffusionPipeline.FromPretrained(DiffusionModel.SD_XL_TURBO);

Here is the error:

Maybe the SDXL Turbo model wasn’t downloaded properly?

Thanks for your help and patience Julien!

1 Like

It’s failing already when initializing the tokenizer.
While it’s a different error, this looks suspiciously similar to the bug we’ve encountered in that method before. Not 100% sure it’s the same though.

I think it would be easiest to copy the com.doji.transformers package from Library/PackageCache
to the Packages/ folder to make it an editable embedded package.

Then in Runtime\Scripts\Clip\ClipTokenizer.cs in the BytesToUnicode() method change the initial statement to this: