326

The New York Times sues OpenAI and Microsoft for copyright infringement (edition.cnn.com)

submitted 2 years ago by L4s@lemmy.world to c/technology@lemmy.world

50 comments fedilink hide all child comments

The New York Times sues OpenAI and Microsoft for copyright infringement::The New York Times has sued OpenAI and Microsoft for copyright infringement, alleging that the companies’ artificial intelligence technology illegally copied millions of Times articles to train ChatGPT and other services to provide people with information – technology that now competes with the Times.

you are viewing a single comment's thread
view the rest of the comments

[-] phoneymouse@lemmy.world 57 points 2 years ago

There is something wrong when search and AI companies extract all of the value produced by journalism for themselves. Sites like Reddit and Lemmy also have this issue. I’m not sure what the solution is. I don’t like the idea of a web full of paywalls, but I also don’t like the idea of all the profit going to the ones who didn’t create the product.

[-] kromem@lemmy.world 28 points 2 years ago* (last edited 2 years ago)

What's the value of old journalism?

It's a product where the value curve is heavily weighted towards recency.

In theory, the greatest value theft is when the AP writes a piece and two dozen other 'journalists' copy the thing changing the text just enough not to get sued. Which is completely legal, but what effectively killed investigative journalism.

A LLM taking years old articles and predicting them until it can effectively learn relationships between language itself and events described in those articles isn't some inherent value theft.

It's not the training that's the problem, it's the application of the models that needs policing.

Like if someone took a LLM, fed it recently published news stories in the prompts with RAG, and had it rewrite them just differently enough that no one needed to visit the original publisher.

Even if we have it legal for humans to do that (which really we might want to revisit, or at least create a special industry specific restriction regarding), maybe we should have different rules for the models.

But to try to claim a LLM that's allowing coma patients to communicate or to problem solve self-driving algorithms or to diagnose medical issues is stealing the value of old NYT articles in its doing so is not really an argument I see much value in.

[-] jacksilver@lemmy.world 10 points 2 years ago

Except no one is claiming that LLMs are the problem, they're claiming GPT, or more specifically GPTs training data, is the problem. Transformer models still have a lot of potential, but the question the NYT is asking is "can you just takes anyone else's work to train them".

[-] kromem@lemmy.world 3 points 2 years ago

There's a similar suit against Meta for Llama.

And yes, we will end up seeing as the dust settles if training a LLM is fair use in case law.

[-] ChucklesMacLeroy@lemmy.world 2 points 2 years ago

Really gave me a whole new perspective. Thanks for that.

[-] Kecessa@sh.itjust.works 14 points 2 years ago* (last edited 2 years ago)

The solution is imposing to these companies the responsibility of tracking their profit per media, tax them and redistribute that money based on the tracking info. They're able to track all the pages you visit, it's complete bullshit when they say they don't know how much they make for each places their ads are displayed.

[-] AllonzeeLV@lemmy.world 11 points 2 years ago

but I also don’t like the idea of all the profit going to the ones who didn’t create the product.

Should... should we tell him?

[-] kilgore_trout@feddit.it 10 points 2 years ago

Tell them instead of mocking them.

Yes, "that's how the world works". But doesn't mean we should stop trying to change it.

[-] DogWater@lemmy.world 3 points 2 years ago

Ai isn't creating the product. It consumed it.

[+] Boiglenoight@lemmy.world -7 points 2 years ago

AI training is piracy by another name.

[-] uriel238@lemmy.blahaj.zone 13 points 2 years ago

Elaborate. Consumption of copyrighted materials is normal use whether by a human or a machine.

[-] Boiglenoight@lemmy.world 4 points 2 years ago

Taking someone else’s work and using it without crediting them or compensating them is theft. If Open AI made a deal with The NY Times to train its product using the papers content, which it would turn around and sell to its own customer base, that would be ethical. What Open AI and other companies like it are doing are stealing ahead of actual law that defines what they’re doing as such.

[-] uriel238@lemmy.blahaj.zone 3 points 2 years ago* (last edited 2 years ago)

So listening to Billie Jean without thanking Michael Jackson is theft? That is use.

How about Billie Jean's baseline which is borrowed from Hall and Oates I Can't Go For That. Was that theft? Michael felt guilty about it but John felt it was routine for creatives to borrow from each other all the time.

How about money- and lobbyist-inspired extensions of copyright so extreme that both songs (heck, the whole oupuses of both artists) have been denied from the public domain? Is that theft too? Or does it only count when companies and rich estates are denied profits?

From your copyright infringement is theft blanket assertion and your inability or refusal to parse out fair use of copyrighted materials, I infer you don't actually understand what copyright is or what purpose it is meant to serve to the public. You are just regurgitating the maximalist rhetoric you've been spoonfed. Its really kinda sad.

Feel free to exercise more nuance. Or if you like you can double down and remove all doubt.

[-] Boiglenoight@lemmy.world 2 points 2 years ago

Using a tool to copy someone else’s work and then profiting off that work without compensating or even attributing the source is stealing.

[-] JonEFive@midwest.social 1 points 2 years ago

Your argument poses an interesting thought. Do machines have a right to fair use?

Humans can consume for the sake of enjoyment. Humans can consume without a specific purpose of compiling and delivering that information. Humans can do all this without having a specific goal of monetary gain. Software created by a for-profit privately held company is inherently created to consume data with the explicit purpose of generating monetary value. If that is the specific intent and design then all contributors should be compensated.

Then again, we can look no further than Google (the search engine, not the company) for an example that's a closely related to the current situation. Google can host excerpts of data from billions of websites and serve that data up upon request without compensating those site owners in any way. I would argue that Google is different though because it literally cites every single source. A search result isn't useful if we don't know what site the result came from.

And my final thought - are works that AI generates is truly transformative? I can see arguments that go either way.

[-] General_Effort@lemmy.world 0 points 2 years ago

Do machines have a right to fair use?

Machines do not have rights or obligations. They cannot be held liable to pay damages or be sentenced for crimes. They cannot commit copyright infringement. But I don't think we'll see "the machine did it" as a defense in court.

are works that AI generates is truly transformative?

Usually they are original and not transformative.

Transformative implies that there is some infringement going on. Say, you make a cartoon with the recent Mickey Mouse. But instead of making the same kind of cartoon as Disney would, you use MM to criticize the policies of the Disney corporation (like South Park did). That transforms the work.

Sometimes AI spits out verbatim copies of training data. That is usually transformative. A couple pages of Harry Potter turn into a technical malfunction.

I hope you'll answer a question in return:

Software created by a for-profit privately held company is inherently created to consume data with the explicit purpose of generating monetary value. If that is the specific intent and design then all contributors should be compensated.

Why? What's the ethical/moral justification for this?

I know how anarcho-capitalists, so-called libertarians, and other such ideologies see it, but perhaps you have a different take. These groups are also not necessarily on board with the whole intellectual property concept. So that's what I am curious about. Full disclosure: I am absolutely not on board with that kind of thinking and am unlikely to be convinced. But I am genuinely interested in learning more.

[-] JonEFive@midwest.social 1 points 2 years ago

Just getting back around to this.

My main reasoning is simply that authors and artists should be fairly credited and compensated for their work. If I create something and share it on the internet, I don't necessarily want a company to make money on that thing, especially if they're making money to my exclusion.

So while I belive that IP as we know it today is probably not be the best way to handle things, I still think creators should have some say over how their works are used and should receive some reasonable share when their works are used for profit. Without creators, those works wouldn't exist in the first place.

Are there other jobs where it would be okay to take a person's services without paying them? What would motivate people to continue providing those services?

[-] General_Effort@lemmy.world 1 points 2 years ago

Hmm. I think you are missing some important information here.

I'm sure you know how it goes for most people who create property: EG Factory workers make some product, are paid for it, but do not own the product. The same is true for people who create intellectual property. They get paid for their work but the employer owns the property. You only own what you make in your own time, unless or until you sell it.

You're talking about paying property owners for providing no services at all.

this post was submitted on 28 Dec 2023

326 points (97.4% liked)

Technology

76194 readers

474 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws