Stable Diffusion (diffusers + transformers package)

Today I wanted to share an early preview of some work I’ve done to run Stable Diffusion models using Sentis:

com.doji.diffusers
com.doji.transformers

I ported the ClipTokenizer from huggingface/transformers and the PNDMScheduler and the denoising loop from huggingface/diffusers.

There is still lots to do, the quality is not really there yet. I think the most important thing would be to get Classifier-Free Guidance to work which I haven’t managed to do yet.
Update: Classifier-Free Guidance works now

I know other people have already shown their success with running diffusion models with Sentis on here, so maybe someone can give some advice or would like to contribute.

The GIF below is with the Stable Diffusion 1.5 model and 15 steps at 512x512 on an RTX 3060.

Diffusers-StableDiffusionPipelineSample-WindowsMacLinux-Unity2022.3.7f1__DX11_2024-01-3000-31-54-ezgif.com-optimize

2 Likes

Update: Got Classifier-Free Guidance working. Much better results now.
Also takes a bit longer to generate though…

3 Likes

This is very interesting, thanks for sharing!

1 Like

Updated the package to include a new scheduler which makes Stable Diffusion 2.1 work:

3 Likes

Do you think something like this could run on an iphone? Or are the memory/processing requirements too great?

Not anytime soon I think. Model sizes range from around 5 to 12 GB for the FP32 versions (quantization will help a bit once it lands in Sentis), but then raw processing power will be a bottleneck for mobile form factors for years to come.

Technically possible of course, some people are running these on a Raspberry PI even: GitHub - vitoplantamura/OnnxStream: Lightweight inference library for ONNX files, written in C++. It can run SDXL on a RPI Zero 2 but also Mistral 7B on desktops and servers. (with inference times of several hours per image :smile:)

just released an update with support for Stable Diffusion XL, as well as turbo models for SD 1.5 and SDXL.
The images below were created with SDXL Turbo at 512x512 with 3 steps in ~6 seconds per image.


Also added img2img support.


(left is input, right is generated output)

4 Likes
ArgumentOutOfRangeException: Specified argument was out of the range of valid values.
Parameter name: count
System.Linq.Enumerable.Range (System.Int32 start, System.Int32 count) (at <93223d662c2546d4b5d1784737504095>:0)
Doji.AI.Transformers.ClipTokenizer.GetRange (System.Int32 start, System.Int32 end) (at Library/PackageCache/com.doji.transformers@0.0.1/Runtime/Scripts/Clip/ClipTokenizer.cs:160)
Doji.AI.Transformers.ClipTokenizer.BytesToUnicode () (at Library/PackageCache/com.doji.transformers@0.0.1/Runtime/Scripts/Clip/ClipTokenizer.cs:135)
Doji.AI.Transformers.ClipTokenizer.Initialize (Doji.AI.Transformers.Vocab vocab, System.String merges, Doji.AI.Transformers.TokenizerConfig config, System.Collections.Generic.Dictionary`2[TKey,TValue] addedTokensDecoder, System.Int32 modelMaxLength, Doji.AI.Transformers.Side paddingSide, Doji.AI.Transformers.Side truncationSide, System.Collections.Generic.List`1[T] modelInputNames, System.Boolean cleanUpTokenizationSpaces, System.Boolean splitSpecialTokens) (at Library/PackageCache/com.doji.transformers@0.0.1/Runtime/Scripts/Clip/ClipTokenizer.cs:58)
Doji.AI.Transformers.ClipTokenizer..ctor (Doji.AI.Transformers.Vocab vocab, System.String merges, Doji.AI.Transformers.TokenizerConfig config) (at Library/PackageCache/com.doji.transformers@0.0.1/Runtime/Scripts/Clip/ClipTokenizer.cs:34)
Doji.AI.Diffusers.StableDiffusionXLPipeline.FromPretrained (Doji.AI.Diffusers.DiffusionModel model, Unity.Sentis.BackendType backend) (at Library/PackageCache/com.doji.diffusers@0.3.1/Runtime/Scripts/Pipelines/PipelineUtils.cs:369)
Doji.AI.Diffusers.DiffusionPipeline.FromPretrained (Doji.AI.Diffusers.DiffusionModel model, Unity.Sentis.BackendType backend) (at Library/PackageCache/com.doji.diffusers@0.3.1/Runtime/Scripts/Pipelines/PipelineUtils.cs:226)
Doji.AI.Diffusers.StableDiffusion.Initialize () (at Library/PackageCache/com.doji.diffusers@0.3.1/Runtime/Scripts/Core/StableDiffusion.cs:67)
Doji.AI.Diffusers.StableDiffusion..ctor (Doji.AI.Diffusers.DiffusionModel model) (at Library/PackageCache/com.doji.diffusers@0.3.1/Runtime/Scripts/Core/StableDiffusion.cs:52)
Doji.AI.Diffusers.Samples.StableDiffusionUI.Start () (at Assets/Samples/Diffusers/0.3.1/Basic Stable Diffusion Sample/StableDiffusionUI.cs:26)

I’m getting this error when trying to launch the demo scene with SDXL Turbo installed and selected. What could be the issue?

1 Like

Hey, thanks for catching this. Never had the clip tokenizer fail to initialize before.

Which way did you download/install the package? Could be a snafu with Git where the file is downloaded in a format that doesn’t support some of the special characters used here.
More specifically these are the characters that seem to produce an unexpected negative value on your system:

List<int> bs = GetRange('!', '~' + 1)         // !:35, ~:126
              .Concat(GetRange('¡', '¬' + 1)) // ¡:161, ¬:172
              .Concat(GetRange('®', 'ÿ' + 1)) // ®:174, ÿ:255
              .ToList();

You might want to check if that code file looks the same on your system and if it does: which one of these characters returns a negative value?

Alternatively: It’s also possible that the code could be buggy on specific region/culture settings that I haven’t tested. You could try to set your region to US or something temporarily, but I’ll investigate more, maybe I can reproduce the issue.

As a more useful short-term workaround: You could just replace the start and end indices with the values in the comment, which is likely what I’ll end up changing the code to.

EDIT: One more correction, the comments have an error too, it should be 33 not 35 for the first index, so:

List<int> bs = GetRange(33, 127)
              .Concat(GetRange(161, 173))
              .Concat(GetRange(174, 256))
              .ToList();

Will fix as soon as possible.

1 Like

I’ve installed the package through OpenUPM. Indeed, the code snippet looks different for me:

            List<int> bs = GetRange('!', '~' + 1) //!:35, ~:126
                .Concat(GetRange('Ў', '¬' + 1)) // Ў:161, ¬:172
                .Concat(GetRange('®', 'я' + 1)) // ®:174, я:255
                .ToList();

Thank you for help, this issue is gone

1 Like

Hello, just found this topic and the package looks very promising.

Unfortunately, I get these errors right after the image generation in demo scene:

Exceeded safe compute dispatch group count limit per dimension [131072, 1, 1] for Transpose2D
UnityEngine.Debug:LogWarning (object)
Unity.Sentis.D:LogWarning (object) (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Internals/Debug.cs:72)
Unity.Sentis.ComputeHelper:Dispatch (Unity.Sentis.ComputeFunc,int,int,int) (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Backends/GPUCompute/ComputeHelper.cs:193)
Unity.Sentis.GPUComputeBackend:Transpose (Unity.Sentis.Tensor,Unity.Sentis.Tensor,int[]) (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Backends/GPUCompute/GPUCompute.cs:1914)
Unity.Sentis.Layers.Transpose:Execute (Unity.Sentis.Tensor[],Unity.Sentis.ExecutionContext) (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Layers/Layer.Transformation.cs:1365)
Unity.Sentis.GenericWorker/<StartManualSchedule>d__33:MoveNext () (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Backends/GenericWorker.cs:243)
Unity.Sentis.GenericWorker:Execute () (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Backends/GenericWorker.cs:176)
Unity.Sentis.GenericWorker:Execute (Unity.Sentis.Tensor) (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Backends/GenericWorker.cs:168)
Doji.AI.Diffusers.VaeDecoder:Execute (Unity.Sentis.TensorFloat) (at Assets/Plugins/com.doji.diffusers/Runtime/Scripts/Models/VaeDecoder.cs:49)
Doji.AI.Diffusers.StableDiffusionPipeline:Generate (Doji.AI.Diffusers.Parameters) (at Assets/Plugins/com.doji.diffusers/Runtime/Scripts/Pipelines/Stable Diffusion/StableDiffusionPipeline.cs:154)
Doji.AI.Diffusers.DiffusionPipeline:Generate (string,System.Nullable`1<int>,System.Nullable`1<int>,System.Nullable`1<int>,System.Nullable`1<single>,string,System.Nullable`1<uint>) (at Assets/Plugins/com.doji.diffusers/Runtime/Scripts/Pipelines/DiffusionPipeline.cs:158)
Doji.AI.Diffusers.StableDiffusion:Imagine (string,int,int,int,single,string) (at Assets/Plugins/com.doji.diffusers/Runtime/Scripts/Core/StableDiffusion.cs:81)
Doji.AI.Diffusers.Samples.StableDiffusionUI:Txt2Img (string) (at Assets/Scenes/01-StableDiffusionPipelineSample/StableDiffusionUI.cs:46)
Doji.AI.Diffusers.Samples.StableDiffusionTxt2ImgUI/<OnGenerateClicked>d__8:MoveNext () (at Assets/Scenes/01-StableDiffusionPipelineSample/StableDiffusionTxt2ImgUI.cs:22)
System.Runtime.CompilerServices.AsyncVoidMethodBuilder:Start<Doji.AI.Diffusers.Samples.StableDiffusionTxt2ImgUI/<OnGenerateClicked>d__8> (Doji.AI.Diffusers.Samples.StableDiffusionTxt2ImgUI/<OnGenerateClicked>d__8&)
Doji.AI.Diffusers.Samples.StableDiffusionTxt2ImgUI:OnGenerateClicked ()
UnityEngine.EventSystems.EventSystem:Update () (at ./Library/PackageCache/com.unity.ugui@2.0.0/Runtime/UGUI/EventSystem/EventSystem.cs:530)

Thread group count is above the maximum allowed limit. Maximum allowed thread group count is 65535.
UnityEngine.ComputeShader:Dispatch (int,int,int,int)
Unity.Sentis.ComputeHelper:Dispatch (Unity.Sentis.ComputeFunc,int,int,int) (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Backends/GPUCompute/ComputeHelper.cs:195)
Unity.Sentis.GPUComputeBackend:Transpose (Unity.Sentis.Tensor,Unity.Sentis.Tensor,int[]) (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Backends/GPUCompute/GPUCompute.cs:1914)
Unity.Sentis.Layers.Transpose:Execute (Unity.Sentis.Tensor[],Unity.Sentis.ExecutionContext) (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Layers/Layer.Transformation.cs:1365)
Unity.Sentis.GenericWorker/<StartManualSchedule>d__33:MoveNext () (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Backends/GenericWorker.cs:243)
Unity.Sentis.GenericWorker:Execute () (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Backends/GenericWorker.cs:176)
Unity.Sentis.GenericWorker:Execute (Unity.Sentis.Tensor) (at ./Library/PackageCache/com.unity.sentis@1.3.0-pre.3/Runtime/Core/Backends/GenericWorker.cs:168)
Doji.AI.Diffusers.VaeDecoder:Execute (Unity.Sentis.TensorFloat) (at Assets/Plugins/com.doji.diffusers/Runtime/Scripts/Models/VaeDecoder.cs:49)
Doji.AI.Diffusers.StableDiffusionPipeline:Generate (Doji.AI.Diffusers.Parameters) (at Assets/Plugins/com.doji.diffusers/Runtime/Scripts/Pipelines/Stable Diffusion/StableDiffusionPipeline.cs:154)
Doji.AI.Diffusers.DiffusionPipeline:Generate (string,System.Nullable`1<int>,System.Nullable`1<int>,System.Nullable`1<int>,System.Nullable`1<single>,string,System.Nullable`1<uint>) (at Assets/Plugins/com.doji.diffusers/Runtime/Scripts/Pipelines/DiffusionPipeline.cs:158)
Doji.AI.Diffusers.StableDiffusion:Imagine (string,int,int,int,single,string) (at Assets/Plugins/com.doji.diffusers/Runtime/Scripts/Core/StableDiffusion.cs:81)
Doji.AI.Diffusers.Samples.StableDiffusionUI:Txt2Img (string) (at Assets/Scenes/01-StableDiffusionPipelineSample/StableDiffusionUI.cs:46)
Doji.AI.Diffusers.Samples.StableDiffusionTxt2ImgUI/<OnGenerateClicked>d__8:MoveNext () (at Assets/Scenes/01-StableDiffusionPipelineSample/StableDiffusionTxt2ImgUI.cs:22)
System.Runtime.CompilerServices.AsyncVoidMethodBuilder:Start<Doji.AI.Diffusers.Samples.StableDiffusionTxt2ImgUI/<OnGenerateClicked>d__8> (Doji.AI.Diffusers.Samples.StableDiffusionTxt2ImgUI/<OnGenerateClicked>d__8&)
Doji.AI.Diffusers.Samples.StableDiffusionTxt2ImgUI:OnGenerateClicked ()
UnityEngine.EventSystems.EventSystem:Update ()

I would be grateful for any help!

1 Like

There is some weirdness with that “Thread group count” error. At least on my system (32GB RAM, RTX 3060), it doesn’t actually cause any problems with the generation, i.e. it produces the same output as the official diffusers pipelines. (“Error Pause” is disabled in the Unity Console.)

However, I have other people also reporting that it doesn’t generate a valid image for them at 512x512.

I think these issues will be fixed with an update to Sentis 1.4, but this is not yet supported by the package, so I’m afraid there isn’t any good workaround right now (other than decreasing resolution which yields subpar results, depending on the model).

1 Like

I see, thank you.

1 Like