252

OpenAI says it’s “impossible” to create useful AI models without copyrighted material (arstechnica.com)

submitted 2 years ago by sculd@beehaw.org to c/technology@beehaw.org

243 comments fedilink hide all child comments

Apparently, stealing other people's work to create product for money is now "fair use" as according to OpenAI because they are "innovating" (stealing). Yeah. Move fast and break things, huh?

"Because copyright today covers virtually every sort of human expression—including blogposts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials," wrote OpenAI in the House of Lords submission.

OpenAI claimed that the authors in that lawsuit "misconceive[d] the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence."

you are viewing a single comment's thread
view the rest of the comments

[-] teawrecks@sopuli.xyz 2 points 2 years ago

Sure, if they want to compete with modern artists, they would need to look at modern artists

Which is the literal goal of Dall-E, SD, etc.

But a human could learn to draw, paint, sculpt, etc purely by only looking at public domain and creative commons works

They could definitely learn some amount of skill, I agree. I'd be very interested to see the best that an AI could achieve using only PD and CC content. It would be interesting. But you'd agree that it would look very different from modern art, just as an alien who has only been consuming earth media from 100+ years ago would be unable to relate to us.

the sky above them and the tree across the street aren't copyrighted.

Yeah, I'd consider that PD/CC content that such an AI would easily have access to. But obviously the real sky is something entirely different from what is depicted in Starry Night, Star Wars, or H.P. Lovecraft's description of the cosmos.

OpenAI's argument is literally that their AI cannot learn without using copyrighted materials in vast quantities

Yeah, I'd consider that a strong claim on their part; what they really mean is, it's the easiest way to make progress in AI, and we wouldn't be anywhere close to where we are without it.

And you could argue "convenient that it both saves them money, and generates money for them to do it this way", but I'd also point out that the alternative is they keep the trained models closed source, never using them publicly until they advance the tech far enough that they've literally figured out how to build/simulate a human brain that is able to learn as quickly and human-like as you're describing. And then we find ourselves in a world where one or two corporations have this incredible proprietary ability that no one else has.

Personally, I'd rather live in the world where the information about how to do all of this isn't kept for one or two corporations to profit from, I would rather live in the version where they publish their work publicly, early, and often, show that it works, and people are able to reproduce it, open source it, train their own models, and advance the technology in a space where anyone can use it.

You could hypothesize of a middle ground where they do the research, but aren't allowed to profit from it without licensing every bit of data they train on. But the reality of AI research is that it only happens to the extent that it generates revenue. It's been that way for the entire history of AI. Douglas Hofstadter has been asking deep important questions about AI as it relates to consciousness for like 60 years (ex. GEB, I am a Strange Loop), but there's a reason he didn't discover LLMs and tech companies did. That's not to say his writings are meaningless, in fact I think they're more important than ever before, but he just wasn't ever going to get to this point with a small team of grad students, a research grant, and some public domain datasets.

So, it's hard to disagree with OpenAI there, AI definitely wouldn't be where it is without them doing what they've done. And I'm a firm believer that unless we figure our shit out with energy generation soon, the earth will be an uninhabitable wasteland. We're playing a game of climb the Kardashev scale, we opted for the "burn all the fossil fuels as fast as possible" strategy, and now we're a the point where either spent enough energy fast enough to figure out the tech needed to survive this, or we suffocate on the fumes. The clock is ticking, and AI may be our best bet at saving the human race that doesn't involve an inordinate number of people dying.

[-] frog@beehaw.org 4 points 2 years ago

OpenAI are not going to make the source code for their model accessible to all to learn from. This is 100% about profiting from it themselves. And using copyrighted data to create open source models would seem to violate the very principles the open source community stands for - namely that everybody contributes what they agree to, and everything is published under a licence. If the basis of an open source model is a vast quantity of training data from a vast quantity of extremely pissed off artists, at least some of the people working on that model are going to have a "are we the baddies?" moment.

The AI models are also never going to produce a solution to climate change that humans will accept. We already know what the solution is, but nobody wants to hear it, and expecting anyone to listen to ChatGPT and suddenly change their minds about using fossil fuels is ludicrous. And an AI that is trained specifically on knowledge about the climate and technologies that can improve it, with the purpose of innovating some hypothetical technology that will fix everything without humans changing any of their behaviour, categorically does not need the entire contents of ArtStation in its training data. AIs that are trained to do specific tasks, like the ones trained to identify new antibiotics, are trained on a very limited set of data, most of which is not protected by copyright and any that is can be easily licenced because the quantity is so small - and you don't see anybody complaining about those models!

[-] teawrecks@sopuli.xyz 2 points 2 years ago

OpenAI are not going to make the source code for their model accessible to all to learn from

OpenAI isn't the only company doing this, nor is their specific model the knowledge that I'm referring to.

The AI models are also never going to produce a solution to climate change that humans will accept.

It is already being used to further fusion research beyond anything we've been able to do with standard algorithms

We already know what the solution is, but nobody wants to hear it

Then it's not a solution. That's like telling your therapist, "I know how to fix my relationship, my partner just won't do it!"

expecting anyone to listen to ChatGPT and suddenly change their minds about using fossil fuels is ludicrous

Lol. Yeah, I agree, that's never going to work.

categorically does not need the entire contents of ArtStation in its training data.

That's a strong claim to make. Regardless of the ethics involved, or the problems the AI can solve today, the fact is we seeing rapid advances in AI research as a direct result of these ethically dubious models.

In general, I'm all for the capitalist method of artists being paid their fair share for the work they do, but on the flip side, I see a very possible mass extinction event on the horizon, which could cause suffering the likes of which humanity has never seen. If we assume that is the case, and we assume AI has a chance of preventing it, then I would prioritize that over people's profits today. And I think it's perfectly reasonable to say I'm wrong.

And then there's the problem of actually enforcing any sort of regulation, which would be so much more difficult than people here are willing to admit. There's basically nothing you can do even if you wanted to. Your Carlin example is exactly the defense a company would use: "I guess our AI just happened to create a movie that sounds just like Paul Blart, but we swear it's never seen the film. Great minds think alike, I guess, and we sell only the greatest of minds".

[-] frog@beehaw.org 1 points 2 years ago

Personally I think the claim that the entire contents of ArtStation will lead to working technology that fixes climate change is the bolder claim - and if there was any merit to it, there would be some evidence for it that the corporations who want copyright to be disapplied to artists would be able to produce. And if we're saying that getting rid of copyright protections will save the planet, then perhaps Disney should give up theirs as well. Because that's the reality here: we're expecting humans to be obliterated by AI but are not expecting the rich and powerful to make any sacrifices at all. And art is part of who we are as a species, and has been for hundreds of thousands of years. Replacing artists with AI because somehow that will fix climate change is not only a massive stretch, but what would we even be saving humanity for at that point? So that everybody can slave away in insecure, meaningless work so the few can hoard everything for themselves? Because the Star Trek utopia where AI does all the work and humans can pursue self-enrichment is not an option on the table. The tech bros just want you to think it is.

this post was submitted on 11 Jan 2024

252 points (100.0% liked)

Technology

41034 readers

585 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 3 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

coldredlight@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org