99
Revealed: The Authors Whose Pirated Books Are Powering Generative AI
(www.theatlantic.com)
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
Seems like a clearly transformative work that would be covered under fair use. As an aside, I've been using AI as an writing assistant/solitary roleplaying GM for several years now and the quality of the prose can be quite good, but the authorship of stories is terrible and I can't say they even do a good job of emulating a particular author's style.
People keep repeating this as me, but the thing is, I've seen what these things produce, and since the humans who created them can't even seem to articulate what is going on inside the black box to produce output, it's hard for me to be like "oh yeah, that human who can't describe what is even going on to produce this totally transformed the work." No, they used a tool to rip it up and shart it out and they don't even seem to functionally know what goes on inside the tool. If you can't actually describe the process of how it happens, the human is not the one doing anything transformative, the program is, and the program isn't a human acting alone, it is a program made by humans with intent to make money off of what the program can do. The program doesn't understand what it is transforming, it's just shitting out results. How is that "transformative."
I mean, it's like fucking Superman 3 over here. "I didn't steal a ton from everyone, just fractions of pennies from every transaction! No one would notice, it's such a small amount." When the entire document produced is made by slivers of hundreds of thousands of copyrighted works, it doesn't strike me as any of it is original, nor justified in calling "Fair Use."
I can explain it quite well in layman's terms, but a rigorous scientific/mathematical explanation is indeed beyond our current understanding.
Not a single original sentence of the original work is retained in the model. It's essentially a massive matrix (math problem) that takes input as a seed value to determine a weighted list of likely next tokens, rolls a random number to pick one, and then does it again over and over. The more text that goes into the model, the less likely it is that any given work would be infringed. Probably every previous case of fair use is less transformative, which would have implications far beyond AI.
Which is why I find it interesting that none of the court cases (as far as I'm aware) are challenging whether an LLM is copying anything in the first place. Granted, that's the plaintiff's job to prove, but there's no need to raise a fair use defense at all if no copying occurred.