99
Revealed: The Authors Whose Pirated Books Are Powering Generative AI
(www.theatlantic.com)
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
Seems like a clearly transformative work that would be covered under fair use. As an aside, I've been using AI as an writing assistant/solitary roleplaying GM for several years now and the quality of the prose can be quite good, but the authorship of stories is terrible and I can't say they even do a good job of emulating a particular author's style.
Clearly transformative only applies to the work a human has put in to the process. It isn't at all clear that an LLM would pass muster for a fair use defense, but there are court cases in progress that may try to answer that question. Ultimately, I think what it's going to come down to is whether the training process itself and the human effort involved in training the model on copyrighted data is considered transformative enough to be fair use, or doesn't constitute copying at all. As far as I know, none of the big cases are trying the "not a copy" defense, so we'll have to see how this all plays out.
In any event, copyright laws are horrifically behind the times and it's going to take new legislation sooner or later.
My bet is: it's going to depend on a case by case basis.
A large enough neural network can be used to store, and then recover, a 1:1 copy of a work... but a large enough corpus can contain more data that could ever be stored in a given size neural network, even if some fragments of the input work could be recovered... so it will depend on how big of a recoverable fragment is "big enough" to call it copyright infringement... but then again, reproducing up to a whole work is considered fair use for some purposes... but not in every country.
Copyright laws are not necessarily wrong; just remove the "until author's death plus 70 years" coverage, go back to a more reasonable "4 years since publication", and they make much more sense.
Almost certainly. Getty images has several exhibits in its suit against Stable Diffusion showing the Getty watermark popping up in its output as well as several images that are substantially the same as their sources. Other generative models don't produce anything all that similar to the source material, so we're probably going to wind up with lots of completely different and likely contradictory rulings on the matter before this gets anywhere near being sorted out legally.
The trouble with that line of thinking is that the laws are under no obligation to make sense. And the people who write and litigate those laws benefit from making them as complicated and irrational as they can get away with.
In this case the Mickey Mouse Curve makes sense, just bad sense. At least the EU didn't make it 95 years, and compromised on also 70... 🙄