Revealed: The Authors Whose Pirated Books Are Powering Generative AI (www.theatlantic.com)

submitted 2 years ago by Powderhorn@beehaw.org to c/technology@beehaw.org

35 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] MagicShel@programming.dev 5 points 2 years ago* (last edited 2 years ago)

Seems like a clearly transformative work that would be covered under fair use. As an aside, I've been using AI as an writing assistant/solitary roleplaying GM for several years now and the quality of the prose can be quite good, but the authorship of stories is terrible and I can't say they even do a good job of emulating a particular author's style.

[-] dingus@lemmy.ml 5 points 2 years ago* (last edited 2 years ago)

Seems like a clearly transformative work that would be covered under fair use.

People keep repeating this as me, but the thing is, I've seen what these things produce, and since the humans who created them can't even seem to articulate what is going on inside the black box to produce output, it's hard for me to be like "oh yeah, that human who can't describe what is even going on to produce this totally transformed the work." No, they used a tool to rip it up and shart it out and they don't even seem to functionally know what goes on inside the tool. If you can't actually describe the process of how it happens, the human is not the one doing anything transformative, the program is, and the program isn't a human acting alone, it is a program made by humans with intent to make money off of what the program can do. The program doesn't understand what it is transforming, it's just shitting out results. How is that "transformative."

I mean, it's like fucking Superman 3 over here. "I didn't steal a ton from everyone, just fractions of pennies from every transaction! No one would notice, it's such a small amount." When the entire document produced is made by slivers of hundreds of thousands of copyrighted works, it doesn't strike me as any of it is original, nor justified in calling "Fair Use."

[-] MagicShel@programming.dev 3 points 2 years ago

I can explain it quite well in layman's terms, but a rigorous scientific/mathematical explanation is indeed beyond our current understanding.

Not a single original sentence of the original work is retained in the model. It's essentially a massive matrix (math problem) that takes input as a seed value to determine a weighted list of likely next tokens, rolls a random number to pick one, and then does it again over and over. The more text that goes into the model, the less likely it is that any given work would be infringed. Probably every previous case of fair use is less transformative, which would have implications far beyond AI.

[-] knotthatone@lemmy.one 3 points 2 years ago* (last edited 2 years ago)

Not a single original sentence of the original work is retained in the model.

Which is why I find it interesting that none of the court cases (as far as I'm aware) are challenging whether an LLM is copying anything in the first place. Granted, that's the plaintiff's job to prove, but there's no need to raise a fair use defense at all if no copying occurred.

load more comments (4 replies)

load more comments (15 replies)

this post was submitted on 21 Aug 2023

99 points (100.0% liked)

Technology

39605 readers

312 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 3 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

coldredlight@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org