265

Stable Diffusion XL Turbo can generate AI images as fast as you can type (arstechnica.com)

submitted 11 months ago by thehatfox@lemmy.world to c/technology@lemmy.world

38 comments fedilink hide all child comments

all 42 comments

sorted by: hot top controversial new old

[-] Contort3860@links.hackliberty.org 46 points 11 months ago* (last edited 11 months ago)

XL Turbotastic Mega Ginormous, etc. Hate naming schemes like this. Why not just make it v2.0 or the Pro version instead? Why use multiple words that make it sound bigger and better? Marketing BS that just sounds dumb.

[-] simple@lemm.ee 48 points 11 months ago* (last edited 11 months ago)

Not sure why you have a problem with it, the naming here makes a lot of sense if you know the context.

Stable Diffusion --> The original SD with versions like 1.5, 2.0, 2.1 etc

Stable Diffusion XL --> A version of SD with much bigger training data and support for much larger resolutions (hence, XL)

Stable Diffusion XL Turbo --> A version of SDXL that is much faster (hence, Turbo)

They have different names because they're actually different things, it's not exactly a v1.0 --> v2.0 scenario

[-] Contort3860@links.hackliberty.org 8 points 11 months ago

Thanks for the context. That does make it much less redundant.

[-] at_an_angle@lemmy.one 3 points 11 months ago

Naming schemes that aren't clear are absolute garbage.

What if you're new to it, and there are 6 different recent versions of something all named with a description instead of version number? Is Jumbo newer than Mega?

Fuck it, I'm ranting about this because it still upsets me.

I wanted to buy a 3DS to play Shovel Knight and Binding of Issac. Reading up on them, BoI would only play on a New 3DS XL. Cool.

Went to the store and bought a new 3DS XL only to find out I got the wrong one. What I wanted was a NEW 3DS XL, and what I got was a 3DS XL that was new. There is a difference, and it took me 4 days to notice, and I was working out of town for the next month. So I can't return it. FUN!

So screw naming new versions of things with names instead of numbers. But somehow, Microsoft screwed that one up.

KISS: Keep it simple, stupid.

[-] simple@lemm.ee 5 points 11 months ago* (last edited 11 months ago)

Sure, 3DS names are dumb, but this is definitely not the case here. Using version numbers instead of different names for different things causes insane confusion and having to over-explain what it is.

See: DLSS

DLSS 2 is just DLSS 1 but better. DLSS 3 is frame generation that isn't compatible with most hardware. DLSS 3.5 is similar to DLSS 2 but includes enhanced raytracing denoising.

It's a nightmare. Making a version 2, 3, 4 etc of something also makes it sound like there's no reason to use the old version, whereas a lot of people are still using the regular stable diffusion over stable diffusion XL.

Imagine if the discussion was "Hey don't use Stable Diffusion 3 since you need a lot of VRAM, you should be using Stable Diffusion 1.5 or Stable Diffusion 2.1, but also it's worth getting a new GPU for Stable Diffusion 4 cuz it's very fast but has lower quality than version 3"

[-] phoenixz@lemmy.ca 1 points 11 months ago

Yeah but the next version has yet a bigger training set, so what then? XXL? and what about the next ? Turbo was already used, so now we call it Nitro? This is not the "new kids" movies, you know...

[-] grue@lemmy.world 33 points 11 months ago

Why not just make it v2.0 or the Pro version instead?

"Pro version" is equally cringe.

[-] Contort3860@links.hackliberty.org 2 points 11 months ago

Yeah I get that. Would just have made more sense given that it's widely used. Though I've been told why the name is so weird and it makes some sense now

[-] tsonfeir@lemm.ee 0 points 11 months ago

Here are my suggestions:

Stable Diffusion Free

Stable Diffusion Paid with Limitations

Stable Diffusion Paid Unlimited

[-] DABDA@lemmy.world 10 points 11 months ago

I agree with you in general, but for Stable Diffusion, "2.0/2.1" was not an incremental direct improvement on "1.5" but was trained and behaves differently. XL is not a simple upgrade from 2.0, and since they say this Turbo model doesn't produce as detailed images it would be more confusing to have SDXL 2.0 that is worse but faster than base SDXL, and then presumably when there's a more direct improvement to SDXL have that be called SDXL 3.0 (but really it's version 2) etc.

It's less like Windows 95->Windows 98 and more like DOS->Windows NT.

That's not to say it all couldn't have been better named. Personally, instead of 'XL' I'd rather they start including the base resolution and something to reference whether it uses a refiner model etc.

(Note: I use Stable Diffusion but am not involved with the AI/ML community and don't fully understand the tech -- I'm not trying to claim expert knowledge this is just my interpretation)

[-] barsoap@lemm.ee 3 points 11 months ago

AFAIU SDXL is actually an erm genetic descendant of SD1.5, with its architecture expanded, weights transferred from 1.5, and then trained on bigger inputs (512x512 in the end is awfully small). SD2.0 is a completely new model, trained from scratch and as far as I'm aware noone's actually using it. Also noone is using the SDXL refiner if you go to civitai it's all models with detailer capabilities baked in, what you do see is workflows that generate an image, add some noise at the very end and repeat the last couple of steps. Using the base sdxl refiner on the output of other sdxl models is sometimes right-out comical because it sometimes has no idea what it's looking at and then produced exquisitely surface texture details of the wrong material. Say a silk keyboard because it doesn't realise that it's supposed to be ABS and, well, black silk exists.

[-] Contort3860@links.hackliberty.org 2 points 11 months ago

Yeah I got some good replies to my comment explaining it. Makes more sense now.

[-] foggy@lemmy.world 2 points 11 months ago

Im just glad we're moving away from purposely misspelled product SEO hacks.

[-] DoucheBagMcSwag@lemmy.dbzer0.com 20 points 11 months ago

This isn't free BTW folks

[-] Sixner@lemmy.world 3 points 11 months ago

I haven't messed with any AI imaging stuff yet. And free recommendations to just have some fun?

[-] lloram239@feddit.de 3 points 11 months ago

Bing Image Creator if you just want to create some images quick (free, Microsoft account required). It's using DALLE3 behind the scenes, so it's pretty much state-of-the-art, but rather limited in terms of features otherwise and rather heavy on the censorship.

If you wanna generate something local on your PC with more flexibility, Automatic1111 along with one of the models from CivitAI, needs a reasonably modern graphics card and enough VRAM (8GB+) to be enjoyable and installation can be a bit fiddly (check Youtube & Co. for tutorials). But once past that you can create some pretty wild stuff.

[-] Tire@lemmy.ml 1 points 11 months ago

Bing and Open AI still and free stuff. Bing’s is actually really good.

[-] mriormro@lemmy.world 8 points 11 months ago

Great, even more online noise that I can look forward to.

[-] LifeInOregon@lemmy.world 6 points 11 months ago

And the resulting faces still all have lazy eyes, asymmetric features, and significantly uncanny issues.

[-] MostlyHarmless@sh.itjust.works 15 points 11 months ago

Humans have asymmetric features. No one is symmetrical

[-] LifeInOregon@lemmy.world 3 points 11 months ago

These features are abnormally asymmetric to the point of being off-putting. General symmetry of features is a significant part of what attracts people one to another, and why facial droops from things like Bells Palsy or strokes can often be psychologically difficult for the patient who experiences them.

General symmetry, not exact symmetry.

[-] Apothecary@lemmy.world 2 points 11 months ago

Anecdote: I think Denzel Washington is supposed to have one of the most symmetrical faces.

[-] Deceptichum@kbin.social 2 points 11 months ago

You can easily get incredibly canny stuff.

[-] Zoboomafoo@lemmy.world 5 points 11 months ago

That's impressive

[-] ecnkmaxo@futurology.today 5 points 11 months ago

[removed by mod]

[-] lurch@sh.itjust.works -1 points 11 months ago

There's a fair chance we'll see (or actually don't see) a lot more offline use. AI apps are coming to desktop PCs and phones and it means in the long run people don't have to get some entertaining stuff from the web any more. Like if you want to a cool pic of a dragon for a wallpaper, you can just ask the AI app on your PC and it will make a bunch to choose from.

[-] atocci@kbin.social 4 points 11 months ago

What's out there that actually works offline? Stable Diffusion is the only one I've heard about, everyone else is more interested in exclusively selling AI as a service.

[-] barsoap@lemm.ee 0 points 11 months ago* (last edited 11 months ago)

Llama etc, reduced ChatGPT models. Never tried them but they're out there. There's also plenty of support stuff that may or may not be interesting, e.g. turning images into depth maps in case you don't have enough angles for actual photogrammetry. controlnet-aux for comfyui has a good selection of that analysis stuff.

[-] Stalinwolf@lemmy.ca 4 points 11 months ago

I've tried to install this multiple times but always manage to fuck it up somehow. I think the guides I'm following are outdated or pointing me to one or more incompatible files.

[-] barsoap@lemm.ee 5 points 11 months ago* (last edited 11 months ago)

Tough luck running any code published by people who put out models, it's research-grade software in every sense of the word. "Works on my machine" and "the source is the configuration file" kind of thing.

Get yourself comfyui, they're always very fast when it comes to supporting new stuff and the thing is generally faster and easier on VRAM than A1111. Prerequisite is a torch (the python package) enabled with CUDA (nvidia) or rocm (AMD) or whatever Intel uses. Fair warning: Getting rocm to run on not officially supported cards is an adventure in itself, I'm still on torch-1.13.1+rocm5.2 newer builds just won't work as the GPU I'm telling rocm I have so that it runs in the first place supports instructions that my actual GPU doesn't, and they started using them.

[-] L_Acacia@lemmy.one 1 points 11 months ago

Do you use comfyui ?

[-] You999@sh.itjust.works 3 points 11 months ago

This is great news for people who make animations with deforum as the speed increase should make Rakile's deforumation GUI much more usable for live composition and framing.

[-] autotldr@lemmings.world 3 points 11 months ago

This is the best summary I could come up with:

Stability detailed the model's inner workings in a research paper released Tuesday that focuses on the ADD technique.

One of the claimed advantages of SDXL Turbo is its similarity to Generative Adversarial Networks (GANs), especially in producing single-step image outputs.

Stability AI says that on an Nvidia A100 (a powerful AI-tuned GPU), the model can generate a 512×512 image in 207 ms, including encoding, a single de-noising step, and decoding.

This move has already been met with some criticism in the Stable Diffusion community, but Stability AI has expressed openness to commercial applications and invites interested parties to get in touch for more information.

Meanwhile, Stability AI itself has faced internal management issues, with an investor recently urging CEO Emad Mostaque to resign.

Stability AI offers a beta demonstration of SDXL Turbo's capabilities on its image-editing platform, Clipdrop.

The original article contains 553 words, the summary contains 138 words. Saved 75%. I'm a bot and I'm open source!

[-] Gabu@lemmy.world -1 points 11 months ago* (last edited 11 months ago)

Does it actually run any faster though? For instance, if I manually spun a model with the diffusers library and ran it locally on dml, would there be any difference?

Edit: Assuming we're normalizing the output to something reasonable, e.g. a recognizable picture of a dog.

this post was submitted on 30 Nov 2023

265 points (91.0% liked)

Technology

59038 readers

4107 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS