240
Recent AI failures are cracks in the magic
(www.theintrinsicperspective.com)
This is a most excellent place for technology news and articles.
I don't know much about LLMs but latent diffusion models already have "meaning" encoded into the model. The whole concept of the u-net is that as it reduces the spacial resolution of the image, it increases the semantic resolution by adding extra dimensions of information. It came from medical image analysis where the idea of labelling something as a tumor would be really useful.
This is why you get body dysmorphic results on earlier (and even current) models. It's identified something as a human limb, but isn't quite sure on where the hand is, so it adds one on to what we know is a leg.
That's perhaps why image generators are comparatively better than text generators. But there's still something off, by your example it seems that the model cannot reliably use clues like position to understand "this is a «leg»". And I don't know much about image generators but I think that they're still statistics- and probability-based.
There was an interesting paper published just recently titled Generative Models: What do they know? Do they know things? Let's find out! (a lot of fun names and titles in the AI field these days :) ) That does a lot of work in actually analyzing what an AI image generator "knows" about what they're depicting. They seem to have an awareness of three dimensional space, of light and shadow and reflectivity, lots of things you wouldn't necessarily expect from something trained just on 2-D images tagged with a few short descriptive sentences. This article from a few months ago also delved into this, it showed that when you ask a generative AI to create a picture of a physical object the first thing the AI does is come up with the three-dimensional shape of the scene before it starts figuring out what it looks like. Quite interesting stuff.