438

Authors Are Furious After Finding Their Works on List of Books Used To Train AI (www.themarysue.com)

submitted 2 years ago by stopthatgirl7@kbin.social to c/technology@lemmy.world

146 comments fedilink hide all child comments

Authors using a new tool to search a list of 183,000 books used to train AI are furious to find their works on the list.

you are viewing a single comment's thread
view the rest of the comments

[-] RalphWolf@lemmy.ca 3 points 2 years ago

Does this fall under fair-use part of copyright?

[-] FaceDeer@kbin.social 8 points 2 years ago

It hasn't been tested in court yet but I don't see why it shouldn't.

[-] admin@lemmy.my-box.dev 3 points 2 years ago

Fair use is any copying of copyrighted material done for a limited and "transformative" purpose, such as to comment upon, criticize, or parody a copyrighted work.

I don't see why it should.

[-] FaceDeer@kbin.social 7 points 2 years ago

The creation of the AI model is transformative. The AI's model does not contain a literal copy of the copyrighted work.

[-] admin@lemmy.my-box.dev 1 points 2 years ago

No, but the training data does contain a copy. And making a model is not criticising, commenting upon, or creating a parody of it.

[-] FaceDeer@kbin.social 5 points 2 years ago

That list is not exclusive, it's just a list of examples of fair use.

The training data is not distributed with the AI model.

[-] admin@lemmy.my-box.dev 4 points 2 years ago* (last edited 2 years ago)

it's just a list of examples of fair use.

Yes, it's a list of quite similar ways of commenting upon a work. Please explain how training an LLM is like any of those things, and thus, how Fair use would apply.

[-] FaceDeer@kbin.social 1 points 2 years ago

I'm not saying that training an LLM is like any of those things. I'm saying it doesn't have to be like those things in order for it to still be fair use.

[-] FontMasterFlex@lemmy.world 3 points 2 years ago

Pay for every bit of information you've read and regurgitated on exams.

[-] BURN@lemmy.world 0 points 2 years ago

AI is not human and should not be treated like a human

[-] FontMasterFlex@lemmy.world 2 points 2 years ago

It's not. The humans that trained it (assumably) purchased the material used to train it. What's the problem?

[-] BURN@lemmy.world 2 points 2 years ago

The use of the material to create a commercial product as well as the reality being that the humans training it never buy the data on an individual level.

[-] kromem@lemmy.world 4 points 2 years ago

The training argument is probably going to come up dry by the time the court works its way through expert testimony, as the underlying argument for training as infringement is insane.

But where OpenAI is probably in hot water is that torrenting 100k books in the first place runs afoul of existing copyright legislation.

Everyone is debating the training in these suits, but the real meat and potatoes is going to be the initial infringement of obtaining the books, not how they were subsequently used.

[-] lloram239@feddit.de 4 points 2 years ago* (last edited 2 years ago)

Authors Guild, Inc. v. Google, Inc. decided that it is fair use to scan books and make large parts of them available verbatim on the net. What AI does is far more transformative than that, as very little of a book can be reproduced verbatim with AI (e.g. popular quotes), you really just get "knowledge" from the books. The sources are however lost in the process, unlike with Google, which by itself however also makes it difficult to argue for copyright violation, since you can't point at what was actually copied.

this post was submitted on 29 Sep 2023

438 points (93.6% liked)

Technology

73839 readers

818 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws