72
you are viewing a single comment's thread
view the rest of the comments
[-] knokelmaat@beehaw.org 33 points 1 day ago* (last edited 1 day ago)

These books were purchased by them before being destroyed in the scanning process. I fail to see the issue with this specific case. Lots of artists buy stuff and irreversibly modify it. Are we going to be angry now at people who glue their puzzles or use parts of books for scrapbooking? If these were unique works there would be an issue, but I don't think that truly unique pieces would be in their target group, as the destructive scanning is all about cost cutting and unique works cost a lot of money that they wouldn't just destroy.

The fact that they use it for model training and later sell access to that model's work is the shady part that has a severe whiff of plagiarism to it.

[-] Vodulas@beehaw.org 1 points 16 hours ago

Paper is a natural resource, and this literally just wasted a fuck ton. There are non-destructive scanning methods.

[-] B0rax@feddit.org 0 points 12 hours ago

They could have just bought the ebooks…

[-] blindsight@beehaw.org 3 points 4 hours ago

Nope. Ebooks are a license, so the First Sale Doctrine does not apply. Buying ebooks is nearly useless, legally.

[-] Vodulas@beehaw.org 1 points 8 hours ago

I would hazard a guess that the eBook did not exist for the physical books they bought. Still, that doesn't excuse their actions, nor the bigger issues with training LLMs

I think it’s a waste tbh. Like it’s one of those capitalist things of “well its not profitable to sell so lets destroy them”, when anything made for the good of the people would’ve seen a massive opportunity to distribute books to people for free!

[-] yetAnotherUser@discuss.tchncs.de 4 points 19 hours ago

Copyright law doesn't allow them to sell the books. It's almost certainly a violation to scan books for their content and then sell them.

Copyright law also doesn’t allow them to download the entirety of a piracy database of books. But here we are, they clearly don’t care about copyright law.

[-] yetAnotherUser@discuss.tchncs.de 4 points 10 hours ago

They didn't care at first. The only reason they began destructively scanning books is because they started to care about copyright law:

Anthropic first chose to amass digitized versions of pirated books to avoid what CEO Dario Amodei called "legal/practice/business slog"—the complex licensing negotiations with publishers. But by 2024, Anthropic had become "not so gung ho about" using pirated ebooks "for legal reasons" and needed a safer source.

this post was submitted on 26 Jun 2025
72 points (95.0% liked)

Technology

39328 readers
390 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 3 years ago
MODERATORS