63
top 23 comments
sorted by: hot top controversial new old
[-] B0rax@feddit.org 3 points 4 hours ago

I don’t get why they didn’t just buy ebooks? Why go through the trouble of scanning physical books?

[-] Michal@programming.dev 1 points 8 minutes ago

The answer lies within the article

Publishers legally control content that AI companies desperately want, but AI companies don't always want to negotiate a license. The first-sale doctrine offered a workaround: Once you buy a physical book, you can do what you want with that copy—including destroy it. That meant buying physical books offered a legal workaround.

And yet buying things is expensive, even if it is legal. So like many AI companies before it, Anthropic initially chose the quick and easy path. In the quest for high-quality training data, the court filing states, Anthropic first chose to amass digitized versions of pirated books to avoid what CEO Dario Amodei called "legal/practice/business slog"—the complex licensing negotiations with publishers. But by 2024, Anthropic had become "not so gung ho about" using pirated ebooks "for legal reasons" and needed a safer source.

[-] Zaleramancer@beehaw.org 9 points 12 hours ago

This reminds me of when I shadowed a librarian in high school and they talked to me about how people got really upset with them throwing away books that had multiple reprintings and were in awful condition.

Because people as a whole lack the capacity for nuance, I guess.

Bad focus on the news article.

[-] Vodulas@beehaw.org 3 points 8 hours ago

people got really upset with them throwing away books that had multiple reprintings and were in awful condition.

That is not what is going on here, though. They bought millions of dollars of new books in order to train AI and used destructive scanning instead of non-destructive methods. It is a huge waste of resources. They could have used a non-destructive method then donated the books. But like everything involved in current AI, they chose the most wasteful method

[-] Zaleramancer@beehaw.org 3 points 6 hours ago

Yeah, see, I am on your side but the focus on "destroying books is bad," is kind of irrelevant to the actual harm being done.

It's that they're devouring the contents of people's brains for the ability to replace them that's concerning. If they chose to do this in a completely different way that preserved the books, I would not say it changes the moral valence of their actions.

By centering the argument on the destruction of the books, it shifts it away from the actual concern.

[-] Vodulas@beehaw.org 1 points 16 minutes ago

Totally fair.

[-] TehPers@beehaw.org 12 points 19 hours ago

The books were purchased and destroyed to digitize them. There is nothing wrong with digitizing a work. The books were destroyed because duplicating a work without permission is illegal, but destroying the original means that there is only one copy in the end still.

The LLM training is the problem. This is not.

[-] Vodulas@beehaw.org 4 points 8 hours ago

The books were destroyed because duplicating a work without permission is illegal

It is not illegal if you don't distribute, which the judge ruled meant this was fair use. They destroyed the books as part of the digitizing project because it is likely faster and cheaper than non-destructive methods.

but destroying the original means that there is only one copy in the end still.

That is not how this works at all. As long as you aren't distributing, you are well within your rights to make copies of a book you purchase.

[-] TehPers@beehaw.org 1 points 5 hours ago* (last edited 5 hours ago)

Quoting the analysis in the ruling:

Authors also complain that the print-to-digital format change was itself an infringement not abridged as a fair use (Opp. 15, 25).

In other words, part of what is being ruled is whether digitizing the books was fair use. Reinforcing that:

Recall that Anthropic purchased millions of print books for its central library... [further down past stuff about pirated copies] Anthropic purchased millions of print copies to "build a research library" (Opp. Exh. 22 at 145, 148). It destroyed each print copy while replacing it with a digital copy for use in its library (not for sharing nor sale outside the company). As to these copies, Authors do not complain that Anthropic failed to pay to acquire a library copy. Authors only complain that Anthropic changed each copy's format from print to digital (see Opp. 15, 25 & n.15).

Bold text is me. Italics are the ruling.

Further down:

Was scanning the print copies to create digital replacements transformative? [skipping each party's arguments]

Here, for reasons narrower than Anthropic offers, the mere format change was fair use.

The judge ruled that the digitization is fair use.

Notably, the question about fair use is important because of what the work is being used for. These are being used in a commercial setting to make money, not in a private setting. Additionally, as the works were inputs into the LLM, it is related to the judge's decision on whether using them to train the LLM is fair use.

Naturally the pirated works are another story, but this article is about the destruction of the physical copies, which only happened for works they purchased. Pirating for LLMs is unacceptable, but that isn't the question here.

The ruling does go on to indicate that Anthropic might have been able to get away with not destroying the originals, but destroying them meant that the format change was "more clearly transformative" as a result, and questions around fair use are largely up to the judge's opinion on four factors (purpose of use, nature of the work, amount of work used, and effect of use on the market).

The print original was destroyed. One replaced the other. And, there is no evidence that the new, digital copy was shown, shared, or sold outside the company. [The question about LLM use is earlier in the ruling] This use was even more clearly transformative than those in Texaco, Google, and Sony Betamax (where the number of copies went up by at least one), and, of course, more transformative than those uses rejected in Napster (where the number went up by "millions" of copies shared for free with others).

... Anthropic already had purchased permanent library copies (print ones). It did not create new copies to share or sell outside.

TL;DR: Destroying the original had an effect on the judge's decision and increased the transformativeness of digitizing the books. They might have been fine without doing it, but the judge admitted that it was relevant to the question of fair use.

[-] Vodulas@beehaw.org 1 points 9 minutes ago

That is true, and they may have been doing to cover their asses, but I would bet they did the destructive method because it was faster or cheaper (or both). We will probably never know the minutia of that decision though

[-] knokelmaat@beehaw.org 29 points 1 day ago* (last edited 1 day ago)

These books were purchased by them before being destroyed in the scanning process. I fail to see the issue with this specific case. Lots of artists buy stuff and irreversibly modify it. Are we going to be angry now at people who glue their puzzles or use parts of books for scrapbooking? If these were unique works there would be an issue, but I don't think that truly unique pieces would be in their target group, as the destructive scanning is all about cost cutting and unique works cost a lot of money that they wouldn't just destroy.

The fact that they use it for model training and later sell access to that model's work is the shady part that has a severe whiff of plagiarism to it.

[-] Vodulas@beehaw.org 1 points 8 hours ago

Paper is a natural resource, and this literally just wasted a fuck ton. There are non-destructive scanning methods.

[-] B0rax@feddit.org 0 points 4 hours ago

They could have just bought the ebooks…

[-] Vodulas@beehaw.org 1 points 13 minutes ago

I would hazard a guess that the eBook did not exist for the physical books they bought. Still, that doesn't excuse their actions, nor the bigger issues with training LLMs

I think it’s a waste tbh. Like it’s one of those capitalist things of “well its not profitable to sell so lets destroy them”, when anything made for the good of the people would’ve seen a massive opportunity to distribute books to people for free!

[-] yetAnotherUser@discuss.tchncs.de 3 points 11 hours ago

Copyright law doesn't allow them to sell the books. It's almost certainly a violation to scan books for their content and then sell them.

Copyright law also doesn’t allow them to download the entirety of a piracy database of books. But here we are, they clearly don’t care about copyright law.

[-] yetAnotherUser@discuss.tchncs.de 2 points 2 hours ago

They didn't care at first. The only reason they began destructively scanning books is because they started to care about copyright law:

Anthropic first chose to amass digitized versions of pirated books to avoid what CEO Dario Amodei called "legal/practice/business slog"—the complex licensing negotiations with publishers. But by 2024, Anthropic had become "not so gung ho about" using pirated ebooks "for legal reasons" and needed a safer source.

[-] OpenStars@piefed.social 11 points 1 day ago

There is something horribly symbolic in all of that 🤮👿📚

It seems (a little) akin to burning books: sure maybe you can get away with doing whatever you wish to a printed copy that you purchase (legally speaking), but that doesn't mean that we (the bystanders) should rush to enjoy using the final product of the endeavor.

[-] windowsphoneguy@feddit.org 14 points 1 day ago

At least they paid for it. Now regarding destroying them, it highly depends on the books in question. One less Harry Potter book won't hurt anyone

[-] Skydancer@pawb.social 1 points 6 hours ago* (last edited 6 hours ago)
[-] ranandtoldthat@beehaw.org 5 points 1 day ago

I gotta reread Vinge's Rainbows End

this post was submitted on 26 Jun 2025
63 points (94.4% liked)

Technology

39328 readers
366 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 3 years ago
MODERATORS