46
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 14 Oct 2025
46 points (97.9% liked)
Asklemmy
52873 readers
696 users here now
A loosely moderated place to ask open-ended questions
Search asklemmy ๐
If your post meets the following criteria, it's welcome here!
- Open-ended question
- Not offensive: at this point, we do not have the bandwidth to moderate overtly political discussions. Assume best intent and be excellent to each other.
- Not regarding using or support for Lemmy: context, see the list of support communities and tools for finding communities below
- Not ad nauseam inducing: please make sure it is a question that would be new to most members
- An actual topic of discussion
Looking for support?
Looking for a community?
- Lemmyverse: community search
- sub.rehab: maps old subreddits to fediverse options, marks official as such
- !lemmy411@lemmy.ca: a community for finding communities
~Icon~ ~by~ ~@Double_A@discuss.tchncs.de~
founded 6 years ago
MODERATORS
Two that I noticed are:
For drawings in the ghibli style, you can see noise on areas that should have all the same colour. That's because of how the diffusion model works, it's very hard for it to replicate lack of variation in colours. If fact that noise will always exist, it's just more noticeable on simple styles.
For music, specifically with Suno, it tends to use the similar sounding instruments between different tracks of the same specifispecified genres, and those sounds might change during the track and never come back to their original sound (because it generates section by section of the track from start to end, the transformer model will feed the last sections back as input to generate the new ones, amplifying possible biases in the model)
I wonder if the noise situation would still be apparent if the model trained only on Ghibli style anime drawings.
Yes, I don't think it's a matter of training.
The diffusion model generates pictures by starting on a canvas with random pixels, then it edits those pixel colours and carves the picture out of that chaos
To achieve an area with all the same colour, it would need to put very exact values on the last generation step.
It can be fixed easily with a very subtle lowpass filter, but that would be human intervention. The model itself will have a hard time replicating it