139

Sarah Silverman and other authors are suing OpenAI and Meta for copyright infringement, alleging that they're training their LLMs on books via Library Genesis and Z-Library (www.thedailybeast.com)

submitted 2 years ago* (last edited 2 years ago) by cypherpunks@lemmy.ml to c/technology@beehaw.org

105 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] ag_roberston_author@beehaw.org 5 points 2 years ago

It’s more like reading a book and then charging people to ask you questions about it.

No, it's really nothing like reading at all. Your example requires a human element. This is just the consumption of data, not reading.

[-] Even_Adder@lemmy.dbzer0.com 3 points 2 years ago

Humans are the ones making these models. It's not entirely the same thing, but you should read this article by the EFF.

[-] ag_roberston_author@beehaw.org 3 points 2 years ago

I don't think that it is even remotely close to being the same thing. I'm sorry but we shouldn't be affording companies the ability to profit off other people's creations without their consent, regardless of how current copyright law works.

Acting as though a human writing a summary is the same thing as a vast network of computers processing data at a speed that is hundreds if not thousands times faster than a human is foolish. Perhaps it is also foolish to try and apply our current copyright laws (which already favour large corporations and not individual creators) to this slew of new technology, but just ignoring the fundamental difference between the two is no way of going about it. We need copyright reform, we need protections for creators, and we need to stop acting as though machine learning algorithms are remotely comparable to humans both in their capabilities, responsibilities and rights.

There is a perfectly reasonable way of doing this ethically, and that is using content that people have provided to the model of their own volition with their consent either volunteered or paid for, but not scraped from an epub, regardless of if you bought it or downloaded it from libgen.

There are already companies training machine learning models ethically in this manner, and if creators do not want their content used as training data, it should not be.

this post was submitted on 10 Jul 2023

139 points (100.0% liked)

Technology

42944 readers

125 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 4 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

coldredlight@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org