252

OpenAI says it’s “impossible” to create useful AI models without copyrighted material (arstechnica.com)

submitted 9 months ago by sculd@beehaw.org to c/technology@beehaw.org

243 comments fedilink hide all child comments

Apparently, stealing other people's work to create product for money is now "fair use" as according to OpenAI because they are "innovating" (stealing). Yeah. Move fast and break things, huh?

"Because copyright today covers virtually every sort of human expression—including blogposts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials," wrote OpenAI in the House of Lords submission.

OpenAI claimed that the authors in that lawsuit "misconceive[d] the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence."

you are viewing a single comment's thread
view the rest of the comments

[-] Powderhorn@beehaw.org 27 points 9 months ago

Any reasonable person can reach the conclusion that something is wrong here.

What I'm not seeing a lot of acknowledgement of is who really gets hurt by copyright infringement under the current U.S. scheme. (The quote is obviously directed toward the UK, but I'm reasonably certain a similar situation exists there.)

Hint: It's rarely the creators, who usually get paid once while their work continues to make money for others.

Let's say the New York Times wins its lawsuit. Do you really think the reporters who wrote the infringed-upon material will be getting royalty checks to be made whole?

This is not OpenAI vs creatives. OK, on a basic level it is, but expecting no one to scrape blogs and forum posts rather goes against the idea of the open internet in the first place. We've all learned by now that what goes on the internet stays there, with attribution totally optional unless you have a legal department. What's novel here is the scale of scraping, but I see some merit to the "transformational" fair-use defense given that the ingested content is not being reposted verbatim.

This is corporations vs corporations. Framing it as millions of people missing out on what they'd have otherwise rightfully gotten is disingenuous.

[-] pupbiru@aussie.zone 1 points 9 months ago

it’s so baffling to me that some people think this is a clear cut problem of “you stole the work just the same as if you sold a copy without paying me!”

it ain’t the same folks… that’s not how models work… the outcome is unfortunate, for sure, but to just straight out argue that it’s the same is ludicrous… it’s a new problem and ML isn’t going away, so we’re going to have to deal with it as a new problem

load more comments (6 replies)

this post was submitted on 11 Jan 2024

252 points (100.0% liked)

Technology

37573 readers

204 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

coldredlight@beehaw.org

Los@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org