Revealed: The Authors Whose Pirated Books Are Powering Generative AI (www.theatlantic.com)

submitted 2 years ago by Powderhorn@beehaw.org to c/technology@beehaw.org

35 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] Moonrise2473@feddit.it 31 points 2 years ago

I can't believe they nonchalantly resorted to piracy in such a massive scale for profit.

If a normal person did that he would be locked in a cell for decades

[-] Overzeetop@beehaw.org 14 points 2 years ago

Well, lots of normal people due this not for profit, which is just as damning in the eyes of copyright.

But what if they had done this in a legitimate fashion - say they got a library account and just ordered the books one by one, read them in, and then returned the books. As I understand it (which is not very well, tbh) the LLM don't keep a copy of the original reference. They use the works to determine paths and branches in what I assume is a quasi-statistical approach (ie stable diffusion associates characteristics with words, but once the characteristics are stored in the model the original is effectively discarded and can't actually be recreated, except in the way a child might reproduce a picture from memory.)

If the dataset is not, in fact, stored, would the authors still have a case?

[-] bedrooms@kbin.social 4 points 2 years ago

I believe this should be allowed, honestly. For, it's dangerous to disallow. I mean, there are dictatorships training their AIs, and they won't care about copyrights. That's gonna be an advantage for them, and the west should feed the same information.

We don't need to allow Steven King, but scientific and engineering articles, sure.

[-] knokelmaat@beehaw.org 2 points 2 years ago

I agree, but even further: those articles should be open to begin with :)

[-] bedrooms@kbin.social 1 points 2 years ago

Yes, but the problem is that the authors of closed articles did sign a copyright transfer agreement (because they basically had no other option). Government cannot and should not override it against the will of the business companies. And this extends to the public.

For these closed articles it's the authors' burden to release the draft. That act is almost always permitted by the signed agreement.

[-] bedrooms@kbin.social 5 points 2 years ago

OpenAI's gonna redo the training.

That said, it's concerning that dictatorships can feed more data to their AIs because they don't care about ethics. At some point their AIs might outperform western ones.

Here comes an unpopular opinion, but for the greater good we might be eventually forced to allow those companies to feed everything.

[-] AAA@feddit.de 8 points 2 years ago

Dictatorships (or any otherwise ideology driven entities) will have their very own problems training AI. Cannot feed the AI material which goes against your own ideology or it might not act in your best interest.

[-] bedrooms@kbin.social 3 points 2 years ago

You know, ChatGPT actually succeeded in controlling its ideological expression to a significant amount. That's one advantage of this model.

[-] HumbertTetere@feddit.de 1 points 2 years ago

There are approaches to delete topics from the trained model, so not sure this will keep them busy for that long.

this post was submitted on 21 Aug 2023

99 points (100.0% liked)

Technology

39889 readers

437 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 3 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

coldredlight@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org