117

Two authors are suing OpenAI for training ChatGPT with their books. Could they win? (theconversation.com)

submitted 2 years ago by sabbah@lemmy.world to c/world@lemmy.world

41 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] brad@toad.work 11 points 2 years ago

Yeah this is a weird one. I don't really know how the line gets drawn between training an AI and plagiarism. My gut feeling is that this feels like suing somebody for being inspired by your work or learning a new word from it.

[-] Flibbertigibbet@lemmy.world 13 points 2 years ago

Yeah, I'm not sure how I feel about it... But I somehow instinctively feel that a human being "inspired" by other works is different to a neural network being trained on a novel. I don't know that I can articulate specifically why one feels okay and the other doesn't... But that's how it feels to me.

[-] eldrichhydralisk@lemmy.sdf.org 10 points 2 years ago

Part of the problem is that AI research likes to use terminology that sounds like what people do, when that's not what the AI actually does.

Large language models are not intelligent in any sense. They are autocomplete on steroids. This is a computer program that was fed a book someone wrote, then mathematically tweaked to be able to guess the next word in a sentence in a way that resembles that book. That's all it does. It does not think or learn in any sense we'd apply to a human.

To me, LLMs sound like a massive plagiarism engine, and I think they should need to get a license from the authors whose works they used to make the LLM under whatever terms that author wants to give, just like a publisher needs to get permission to print a copy of the work. But copyright law has no easy "bright line" for what counts and what doesn't. So the courts will have to decide whether what the AI "creates" is similar enough to the original works to count as a violation, or if the AI and its results are transformative enough to count as something new.

[-] velvetinetouch@lemmy.world 4 points 2 years ago

I am sick of this trope of trying to argue that system X is or isn't intelligent because it was built to do something that can be done non intelligently. LLMs are autocomplete, that's just literally what they do. The autocomplete on your phone isn't very intelligent if at all. Humans are DNA replicators but so are bacteria, which aren't very intelligent if at all. You can't argue from the type and/or character of the task whether something that was built to do that task is intelligent or not. LLMs at least appear to be intelligent because they do just about everything the AI skeptics were demanding machines must do in order to prove intelligence just 5 years ago, if you want to argue they're not intelligent you need to do much more work than just calling them names like fuzzy jpeg, stochastic parrot, and autocomplete on steroids.

[-] eldrichhydralisk@lemmy.sdf.org 1 points 2 years ago

I use the term "autocomplete on steroids" because it gets across a vaguely accurate idea of what an LLM is and how it works to people who are thinking of it like sci-fi movie AI. Sorry if it came across that was my whole reason for considering them not intelligent.

LLMs do seem to pass a lot of intelligence tests we've come up with. Talking with one for the first time is a really uncanny experience, it's a totally different thing than the old voice assistants. But they also consistently fail at tasks that would indicate an understanding of a topic. They produce good looking equations, but the math underneath doesn't make sense. They hallucinate facts that don't fit with the rest of what they themselves are saying, but look similar to the way right answers are written and defended. They produce really convincing responses, but when they fail they betray some really basic failures to understand what they're saying.

I feel that LLMs are brute-forcing the tests people designed to measure intelligence. They can pass the bar exam, but they also contain thousands of successful bar exams to consult and millions of bits of text to glue those answers together with. But if you ask the LLM to actually do the job of a lawyer, they start producing all kinds of garbage that sounds good but doesn't stand up to scrutiny when someone looks up the hallucinated case references.

[-] brad@toad.work 2 points 2 years ago

I agree with you but, since I can't come up with a reasonable explanation for it, my brain wants to err on the side of them being largely the same for whatever reason

[-] kromem@lemmy.world 1 points 2 years ago

In part it feels that way because you, along with pretty much every other human being online today, have been propagandized for decades now with SciFi inspired from dystopian futurist predictions around AI which are almost universally clearly obsolete and misinformed by now, but still persist due to anchoring bias.

AI trained to predict collective human thought ends up replicating quite a lot more than most people thought would be possible in our lifetimes.

And yet when it exhibits emotional intelligence it's called creepy, when it exhibits above average reasoning capabilities it's called scary, and when it displays a potential for automating large swaths of busywork for most humans it's called a threat.

Next to no one I see discussing the topic is considering the opportunity costs here, as the media influence on perceiving AI as 'other' is so pervasive that most humans fall into treating it like a monkey from another forest competing for bananas rather than treating it like a much better stick.

[-] kromem@lemmy.world 4 points 2 years ago

There are already laws regarding producing works too similar to copyrighted material.

Production is infringement, not training.

If I feed all of Stephen King into a LLM such that it learns what well written horror narratives looks like, and it produces a story with original and different plot elements distinct from copyrighted works, that's fine.

If it starts writing about killer clowns thwarted by child orgies in the sewers then you might have an infringement problem.

And ironically, the best tool for protecting copyrighted material from infringement is going to be...LLMs (acting in a discriminator role comparing indexed copy to protected works).

If 'training' ends up successfully labeled as infringement we're going to end up with much worse long term outcomes in jurisdictions that honor that ruling than we otherwise would.

This is the longer tail masses adopting MPAA math in trying to tally potential losses and in the efforts to protect the status quo are shooting themselves in the foot on laying claim to the future of the industry, inevitably leading to being left out of the next round of growth.

Also, from an 'infringenent' standpoint it just means we'll see less open models and more closed ones which ends up using other jurisdictional models to launder copyrighted materials for synthetic training data.

This is beyond dumb.

this post was submitted on 07 Jul 2023

117 points (98.3% liked)

World News

56048 readers

91 users here now

A community for discussing events around the World

Rules:

Rule 1: posts have the following requirements:
- Post news articles only
- Video links are NOT articles and will be removed.
- Title must match the article headline
- Not United States Internal News
- Recent (Past 30 Days)
- Screenshots/links to other social media sites (Twitter/X/Facebook/Youtube/reddit, etc.) are explicitly forbidden, as are link shorteners.
Rule 2: Do not copy the entire article into your post. The key points in 1-2 paragraphs is allowed (even encouraged!), but large segments of articles posted in the body will result in the post being removed. If you have to stop and think "Is this fair use?", it probably isn't. Archive links, especially the ones created on link submission, are absolutely allowed but those that avoid paywalls are not.
Rule 3: Opinions articles, or Articles based on misinformation/propaganda may be removed.
Rule 4: Posts or comments that are homophobic, transphobic, racist, sexist, anti-religious, or ableist will be removed. “Ironic” prejudice is just prejudiced.
Posts and comments must abide by the lemmy.world terms of service UPDATED AS OF OCTOBER 19 2025
Rule 5: Keep it civil. It's OK to say the subject of an article is behaving like a (pejorative, pejorative). It's NOT OK to say another USER is (pejorative). Strong language is fine, just not directed at other members. Engage in good-faith and with respect! This includes accusing another user of being a bot or paid actor. Trolling is uncivil and is grounds for removal and/or a community ban.

Similarly, if you see posts along these lines, do not engage. Report them, block them, and live a happier life than they do. We see too many slapfights that boil down to "Mom! He's bugging me!" and "I'm not touching you!" Going forward, slapfights will result in removed comments and temp bans to cool off.

Rule 6: Memes, spam, other low effort posting, reposts, misinformation, advocating violence, off-topic, trolling, offensive, regarding the moderators or meta in content may be removed at any time.
Rule 7: We didn't USED to need a rule about how many posts one could make in a day, then someone posted NINETEEN articles in a single day. Not comments, FULL ARTICLES. If you're posting more than say, 10 or so, consider going outside and touching grass. We reserve the right to limit over-posting so a single user does not dominate the front page.

We ask that the users report any comment or post that violate the rules, to use critical thinking when reading, posting or commenting. Users that post off-topic spam, advocate violence, have multiple comments or posts removed, weaponize reports or violate the code of conduct will be banned.

All posts and comments will be reviewed on a case-by-case basis. This means that some content that violates the rules may be allowed, while other content that does not violate the rules may be removed. The moderators retain the right to remove any content and ban users.

Lemmy World Partners

News !news@lemmy.world

Politics !politics@lemmy.world

World Politics !globalpolitics@lemmy.world

Recommendations

How to spot Misinformation and Propaganda

For Firefox users, there is media bias / propaganda / fact check plugin.

https://addons.mozilla.org/en-US/firefox/addon/media-bias-fact-check/

Consider including the article’s mediabiasfactcheck.com/ link

founded 2 years ago

MODERATORS

NewsAutoMod@lemmy.world

jordanlund@lemmy.world

Tenthrow@lemmy.world

little_cow@lemmy.world

lemmyAtom@lemmy.world