Stable Diffusion: Fundamental technological shift

Hi!

I don’t post much, but I’m a regular reader, so I believe there’s a reasonable chance that the mods and/or actual unity developers will read this, so here goes:

Seriously, stop whatever you are working on, and pay attention to what is happening on GitHub - CompVis/stable-diffusion: A latent text-to-image diffusion model

There are rare times when you see generational shifts in technology taking place, and if you miss the boat on this one, you will be regretting it… literally, forever.

There is a very brief window here, to be a leader in integrating stable diffusion and related technology into unity, even in an experimental manner, that will either make unity a leader, or a left-behind.

I know is a complex issue, but mm… well, I think the internet is pretty clear that stable diffusion is a fundamental transition in art and asset generation.

If you are not paying attention, for whatever reason, or not applying yourselves to actively integrating and experimenting with this technology then DO SO NOW.

Here are some things that have happened in the last 10 days, since this model was released:

I can’t say this any more clearly than this:

This is an absolutely transformative technology.

Sometimes things pass and it seems like you folk (ie. Unity, the company) are drifting along doing this and that with all kind of priorities, and I totally get that.

If the next blog post I read is about unreal now has an in-engine img2img to use SD, you will know, that you’ve missed the boat, and are playing catchup. So… have a look. Have a play. Reach out, work with them. Do something amazing.

…but, please: PAY ATTENTION because, I think… looking back, you might find that this is more important than anything else you’re doing right now.

That is all.

2 Likes

You’re a bit late to the party.

Here’s the thead where it was already discussed:
https://discussions.unity.com/t/890648

Here are posts in it explaining how to use it.
https://discussions.unity.com/t/890648 page-4#post-8406549
https://discussions.unity.com/t/890648 page-4#post-8409678
https://discussions.unity.com/t/890648 page-4#post-8410107
https://discussions.unity.com/t/890648 page-4#post-8411547

It is one of the situation where unity actually doesn’t need to do anything, because there’s no boat to miss.

It is already available to everybody and the internet is full of sites describing how to build prompts for it.

It already works as a standalone tool, and trying to integrate it into engine is kinda pointless.

7 Likes

You forgot to explain WHY. :wink:

1 Like

For people who want a really easy entry point - someone made a neat little GUI with integrated installer: https://www.reddit.com/r/StableDiffusion/comments/x1hp4u/my_easytoinstall_windows_gui_for_stable_diffusion/

For art it’s cool. And quite funny, I’m active in some art community and posted some pieces… and got a higher fave/view ratio than some commissioned art I paid money for ._.’

However it’s hard to see the practical use for game assets asides of concept art yet because what you cannot do very well is tell “draw me the great character you have just drawn, but as a frontal view instead of sideview”. It excells at “one off” things or things it has thousands of samples in its dataset. Backgrounds and props work though with some manual adjustment afterwards.

That said, this is just the beginning - aka the first system that is really usable for said one-off usecases. Indeed see the thread neginfinity linked. There are aims for pixel art etc. too. Maybe the reproducability issue will be solved as well at some point.

Yeah good point. Let’s not make this into a mere hype like NFTs xP

While not practical use to myself yet, it certainly brought me a lot of joy and that’s something :slight_smile:

It’s already been effectively integrated into photoshop and krita, proving you are incorrect.

There is considerable value in having your tooling tightly integrated into your workflow.

Never the less, you’re welcome to your opinion; I shall, however, bookmark this post to bring up later, when it is integrated into other engines, and people moan and complain that unity doesn’t have those features; because ‘the community thought it was kinda pointless’.

At any rate, I’m not interested in what other developers think; the tools are out there. People are already using it to generate game assets.

What I care about, is that Unity, the company, is paying attention. …because, bluntly, anyone who is not paying attention right now, should be struck off for being asleep at the wheel.

This took me 15 seconds to generate. You can definitely generate more than simply ‘concept art’.

You decide if you see value in it… but, I think it’s fair to say that some people will find it has some value.

(the latter being the output from taking the output from the first image and looping it back in as input for a second round)

8412693--1111950--IMG_56BD8240FD47-1.jpeg
8412693--1111953--IMG_DBB3EF0BAF1B-1.jpeg

is it a fundamental technological shift or something that saves production time in certain niches?

Ther’s a thing called textual inversion. Where you give 3 to 5 images to yet ANOTHER neural network, and it produces a gibberish “vector” which describes it within your neural network.

Then you can use the gibberish vector to refer to the thing.

I’ve not used it yet, however. The requirements to run inversion are significantly higher.

Take a look at the posts I linked and you’ll be able to greatly improve it.

Like I said, you’re late to the party.

Nobody who says something is a fundamentally transformative technology ever seems to bother to explain why, probably because they’ve only ever bought into the hype without thinking of the practical way these technologies end up disseminating into the ecosystem. AI art and its various implementations do have a place in the industry, but fundamentally it’s going to be things like texture generation (we already have this in some capacity, its inroads are proven), blocking out concept art to have more detail (management loves this shit), and filling in background elements and creating other non-hero assets.

Of course, this all comes with a lot of problems. In the case of the concept art and non-hero assets, I get the feeling we’re going to very much see something similar happen in those spaces as we’ve seen happen from scratch music in film scores, where something will be generated relatively close to what’s going to go into the final product and what ends up in the final project is going to be little more than a paintover of that. Management loves this shit because it means that there’s a locked in process that doesn’t have to be adapted too much, lots of artists hate it because it fundamentally reduces their job to something that’s easily replaced by entry level workers who just need to know their way around image editing.

More than that, however, there’s still the upcoming potential legal issues. Despite what people will tell you, there is a fundamental difference between a human being taking inspiration from art they’ve engaged with as a person and throwing all the art ever into a training data set and then drawing from that to create something new. The real issue here is that this means that the training data is fundamentally entirely derivative from work that is entirely unlicensed. On top of that, the AI output suffers from a lot of copyright issues because copyright generally requires human authorship. This means that additional work would have to be ensured to establish provenance on the part of companies using it in any significant capacity.

Stop buying into the hype train and shiny demos and think about how things actually work when they’ve gone through widespread adoption.

2 Likes

Like this one? :smile:

8412753--1111986--upload_2022-9-3_18-1-10.png

I particularly like the interpretation of “offroad”. :sunglasses:

Which it is. The AI was programmed by humans. The person generating the image defined the parameters.

Cool, enjoy being wrong about how this works I guess.

https://www.smithsonianmag.com/smart-news/us-copyright-office-rules-ai-art-cant-be-copyrighted-180979808/

2 Likes

=> Other countries put less emphasis on the necessity of human authorship for protection.

I’m in an Other country. And given the appeal, and future legal conflicts expanding the gray areas, I’m sure that eventually even the USCO will change their point of view. I’d say as soon as a tech giant makes such an appeal and there’s REAL money on the line …

That’s the inverse though. Yeah you cannot copyright a piece generated via AI (in this particular ruling) but you can still use it.
And if your game pops off with a specific character generated by the AI, you acquire a trademark for it (of course for money. Nintendo and co. pay as well for having Mario etc. protected). For a trademark it does not matter so much how it was created.

Rulings. Plural. Repeated attempts have been made. Also that’s really not how trademark works.

You’re not using it correctly.

And that’s not how it works. The AI was not programmed.

1 Like

https://www.deutscheranwaltspiegel.de/intellectualproperty/copyright/do-ai-generated-works-qualify-for-copyright/

I wish more people would understand this. And even more I’d wish they’d have more compassion for artists, but I guess that is a lost cause…
I’ve played around some with stable diffusion and I’m convinced it isn’t nearly as “intelligent” as the marketing hype and apologists want us to believe. I think it’s glorified copyright infringement on a massive scale and I think over time (perhaps with help of other trained models) we’ll find out the true scale of how aggressively it’s ripping parts from training set images to construct its output images. This is the atomic war escalation level of global scale copyright ignorance and I bet it will come back to bite many of the early adopters.

That can be dismissed on basis of the data.
Dall-E 2 used a few hundred millions of images of training data. It contains only 3.5 billion parameters in its model though. That means just a few dozen values per image. Good luck at reproducing a training image or even identifying an image with so little data (it would be an amazing compression feat. if possible).

The AIs do not copy images. They utilize the “knowledge” that’s behind how images are produced. And you better do not wanna copyright knowledge like people did in the middle ages (where only “guilds” were allowed to posses the means to do certain crafts).

The one thing I could imagine becoming copyrightable are specific art styles if they are recognizable enough.

With you being an artist, you should probably take a look at this video to see how people use it to make their own works from a template.
https://www.reddit.com/r/StableDiffusion/comments/x0b0rq/img2img_fantasy_art_walkthrough_video/

Regarding copyright…

Diffusion networks work in following way. They start with a noise. And then, in turns, apply filters to it that make the noise look more like a desired target.

https://towardsdatascience.com/stable-diffusion-best-open-source-version-of-dall-e-2-ebcdf1cb64bc

In case of stable diffusion, each keyword is a function like that.
The thing is, for example, a “dog()” function does not describe individual dogs. It describes all possible dogs. So there is not really a specific dog in the dog function and no training image.

Another issue is that long time ago when discussing something with specific artists, I was asking (I think) about copyright issues with use of reference images.

The answer I got was along the lines that if you “steal” from sufficiently large number of sources, the result will no longer resemble any of them, and thus fail substantial similarity test.

And then neural net pretty much used original images to derive the references from them.

Except it can’t be so easily dismissed. Dall-E and Midjourney both just scraped data from every online resource they could, regardless of whether or not they had the rights to. Midjourney was especially revealing because it would often recreate Shutterstock watermarks. GPT-2 does the same, with some prompts resulting in lines being sampled whole-cloth from writing online, including fanfiction authors putting “if you liked this, please pledge to my Patreon” at the end of stories because the language model saw that as how things just concluded.

What you are effectively arguing here, poorly at that, is that because they do this with so much data that it’d be impossible to police. That’s not just a bad argument but ignores the part where these large dataset algorithms can not exist without the use of labour they are not entitled to. These are not things that are just in the public domain and your argument is basically the exact same as people who (incorrectly) say “well if I found it on the internet that means it’s free!”

But that’s not true, and there’s no legal basis to believe that. Jumping whole-hog into the “AI revolution” is already leaving a lot of people burned and that’s going to lead to a lot of legal problems, especially if it becomes clear that, say, the training data contains footage from Disney movies.

That’s not true at all and you’re prescribing way too much to how AI works. There is no deep understanding, they simply analyse patterns. That’s not “how the images are produced,” that’s tracing. Your entire argument about “copyrighting knowledge” is also pretty embarrassing because, again, an algorithm is not a person. There is no creative interpretation here, it is code that replicates things based on fuzzied data.

1 Like