Judge Finds AI Training on Complete Books ‘Reasonably Necessary’



This content originally appeared on HackerNoon and was authored by Legal PDF: Tech Court Cases

:::tip ANDREA BARTZ, CHARLES GRAEBER, and KIRK WALLACE JOHNSON v. ANTHROPIC PBC, retrieved on June 25, 2025, is part of HackerNoon’s Legal PDF Series. You can jump to any part in this filing here. This is part 8 of 10.

:::

3. THE AMOUNT AND SUBSTANTIALITY OF THE PORTION USED

The third fair use factor is “the amount and substantiality of the portion” of the copyrighted work used by the accused. 17 U.S.C. § 107(3). The crux of this factor is whether the amount was “reasonable in relation to the purpose of the copying.” Campbell, 510 U.S. at 586. Thus, the amount of copying is considered first against the work itself, then more importantly against the proposed transformative purpose. See Warhol, 598 U.S. at 543 & n.18.

A. THE COPIES USED TO TRAIN SPECIFIC LLMS

Copies selected for inclusion in training sets were selected because they were complete and because they contained rich protectible expression, or so this order accepts the record shows for Authors. Was all this copying reasonably necessary to the transformative use?

Yes.

“What matters [ ] is not so much ‘the amount and substantiality of the portion used’ in making a copy, but rather the amount and substantiality of what is thereby made accessible to a public [in the purported secondary use] for which it may serve as a competing substitute [for the primary use].” Google, 804 F.3d at 222. Here, once again, there is no allegation of any traceable connection between the Claude service’s outputs and Authors’ works. The copying used to train the LLMs underlying Claude was thus especially reasonable.

In response, Authors object primarily that the copying used in training was both extremely extensive and not strictly necessary.

As to extensive copying, it is true that entire works were copied. And, “copying [ ] entire work[s] ‘militate[s] against a finding of fair use.’” Worldwide Church of God v. Philadelphia Church of God, Inc., 227 F.3d 1110, 1118 (9th Cir. 2000) (quoting Hustler Mag. Inc. v. Moral Majority Inc., 796 F.2d 1148, 1155 (9th Cir. 1986)); see Campbell, 510 U.S. at 587. But we just addressed why Authors’ argument is misdirected. The copies that count for this factor are those that would merely serve the same use as the work’s ordinary one. Authors do not allege such copying. The accused use here of the incremental copies is as orthogonal as can be imagined to the ordinary use of a book.

As to strict necessity, Authors make a stronger point. When a productive use is made possible only by borrowing from a specific work, fair use climbs towards its zenith. When a productive use is possible without that borrowing, fair use falls to its nadir — and the borrowing deserves a particularly compelling justification. See Warhol, 598 U.S. at 543 & n.18, 547. Here, it is true that Anthropic could have used some other books or no books at all for training its LLMs — or so this order accepts the record shows for Authors. But Anthropic has presented a compelling explanation for why it was reasonably necessary to use them anyway.

For one thing, all agree Anthropic needed billions of words to train any given LLM. If using only books, Anthropic would have needed millions of books per model. If using a set comprising only a small fraction of books and a larger fraction of other texts, Anthropic still would have needed hundreds of thousands of books. Authors contend that because Anthropic showed it could use such smaller sets of books, it surely could have used no books at all — or at least not their books (Opp. 23). But Authors forget that “reasonably necessary” does not mean “strictly necessary.” Authors do not contest that the volume of text required to train an LLM is monumental. Because using so many works was reasonably necessary, using any one work for actually training LLMs was about as reasonable as the next.

For another thing, no output to the public was even alleged to be infringing. So, yes, Authors’ works were chosen as the strongest examples of writing. But the compelling benefits of training the LLMs on strong examples were not offset by revelations to the public of any portion of the works themselves. What was copied was therefore especially reasonable and compelling.

The third factor thus favors fair use for the training copies.

B. THE COPIES USED TO BUILD A CENTRAL LIBRARY

But again, there was a separate use — a distinction that makes some difference as to whether the amount and substantiality of the copying was “reasonable in relation to the purpose of the copying” for the library copies. Campbell, 510 U.S. at 586.

(i) The Purchased Library Copies Converted from Print to Digital.

For the print library copies that Anthropic purchased and then converted into digital library copies, Anthropic already enjoyed entitlement to keep the copies in its library. The purpose of the copying was to keep them in its library but with more favorable storage and searchability properties. Copying the entire work was exactly what this purpose required. There was no surplus copying. The source copy was destroyed.

The third fair use factor favors fair use for the purchased library copies converted from print to digital.

(ii) The Pirated Library Copies.

For the pirated library copies, however, Anthropic lacked any entitlement to hold copies of the books at all. Its purpose, it says, was to train LLMs. But its objective conduct was to seek “all the books in the world” and then retain them even after deciding it would not make further copies from them for training — indicating there were other further uses. Against the purpose of acquiring all the books one could on the chance some might prove useful for training LLMs and maybe other stuff too, almost any unauthorized copying would have been too much. Anthropic copied millions of books in toto, Authors’ works among them.

The third factor points against fair use for the pirated library copies.


:::tip Continue reading HERE.

:::


:::info About HackerNoon Legal PDF Series: We bring you the most important technical and insightful public domain court case filings.

\ This court case retrieved on June 25, 2025, from storage.courtlistener.com, is part of the public domain. The court-created documents are works of the federal government, and under copyright law, are automatically placed in the public domain and may be shared without legal restriction.

:::

\


This content originally appeared on HackerNoon and was authored by Legal PDF: Tech Court Cases