Large Language Models generate human-like text. They operate on words broken up as tokens and predict the next one in a sequence. Image Diffusion models take a static image of noise and iteratively denoise it into a stable image.
The confusion comes from services like OpenAI that take your prompt, dress it up all fancy, and then feed it to a diffusion model.
No. LLMs are still what generates images.
Large Language Models generate human-like text. They operate on words broken up as tokens and predict the next one in a sequence. Image Diffusion models take a static image of noise and iteratively denoise it into a stable image.
The confusion comes from services like OpenAI that take your prompt, dress it up all fancy, and then feed it to a diffusion model.
You can't use LLMs to generate images.
That is a completely different beast with their own training set.
Just because both are made by machine learning, doesn't mean they are the same.