I can't take anyone seriously that says it's "trained on stolen images."
Stolen, you say? Well, I guess we're going to have to force those AI companies to put those images back! Otherwise, nobody will be able to see them!
...because that's what "stolen" means. And no, I'm not being pendantic. It's a really fucking important distinction.
The correct term is, "copied" but that doesn't sound quite as severe. Also, if we want to get really specific, the images are presently on the Internet. Right now. Because that's what ImageNET (and similar) is: A database of URLs that point to images that people are offering up for free to anyone that wants on the Internet.
Did you ever upload an image anywhere publicly, for anyone to see? Chances are someone could've annotated it and included it in some AI training database. If it's on the Internet, it will be copied and used without your consent or knowledge. That's the lesson we learned back in the 90s and if you think that's not OK then go try to get hired by the MPAA/RIAA and you can try to bring the world back to the time where you had to pay $10 for a ringtone and pay again if you got a new phone (because—to the big media companies—copying is stealing!).
Now that's clear, let's talk about the ethics of training an AI on such data: There's none. It's an N/A situation! Why? Because until the AI models are actually used for any given purpose they're just data on a computer somewhere.
What about legally? Judges have already ruled in multiple countries that training AI in this way is considered fair use. There's no copyright violation going on... Because copyright only covers distribution of copyrighted works, not what you actually do with them (internally; like training an AI model).
So let's talk about the real problems with AI generators so people can take you seriously:
- Humans using AI models to generate fake nudes of people without their consent.
- Humans using AI models to copy works that are still under copyright.
- Humans using AI models to generate shit-quality stuff for the most minimal effort possible, saying it's good enough, then not hiring an artist to do the same thing.
The first one seems impossible to solve (to me). If someone generates a fake nude and never distributes it... Do we really care? It's like a tree falling in the forest with no one around. If they (or someone else) distribute it though, that's a form of abuse. The act of generating the image was a decision made by a human—not AI. The AI model is just doing what it was told to do.
The second is—again—something a human has to willingly do. If you try hard enough, you can make an AI image model get pretty close to a copyrighted image... But it's not something that is likely to occur by accident. Meaning, the human writing the prompt is the one actively seeking to violate someone's copyright. Then again, it's not really a copyright violation unless they distribute the image.
The third one seems likely to solve itself over time as more and more idiots are exposed for making very poor decisions to just "throw it at the AI" then publish that thing without checking/fixing it. Like Coca Cola's idiotic mistake last Christmas.