71
Anthropic destroyed millions of print books to build its AI models
(arstechnica.com)
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
Quoting the analysis in the ruling:
In other words, part of what is being ruled is whether digitizing the books was fair use. Reinforcing that:
Bold text is me. Italics are the ruling.
Further down:
The judge ruled that the digitization is fair use.
Notably, the question about fair use is important because of what the work is being used for. These are being used in a commercial setting to make money, not in a private setting. Additionally, as the works were inputs into the LLM, it is related to the judge's decision on whether using them to train the LLM is fair use.
Naturally the pirated works are another story, but this article is about the destruction of the physical copies, which only happened for works they purchased. Pirating for LLMs is unacceptable, but that isn't the question here.
The ruling does go on to indicate that Anthropic might have been able to get away with not destroying the originals, but destroying them meant that the format change was "more clearly transformative" as a result, and questions around fair use are largely up to the judge's opinion on four factors (purpose of use, nature of the work, amount of work used, and effect of use on the market).
TL;DR: Destroying the original had an effect on the judge's decision and increased the transformativeness of digitizing the books. They might have been fine without doing it, but the judge admitted that it was relevant to the question of fair use.
That is true, and they may have been doing to cover their asses, but I would bet they did the destructive method because it was faster or cheaper (or both). We will probably never know the minutia of that decision though