01.07.2025 10:47

Intriguing Study Reveals Meta Llama 3.1 70B Can Reproduce Up to 42% of Harry Potter Book Verbatim

News image

A fascinating new study conducted by researchers from Stanford, Cornell, and West Virginia University has uncovered striking insights into the memory capabilities of AI language models.

The investigation focused on five open-source models, analyzing their ability to recall text from the Books3 dataset, commonly used for training. Among the findings, Meta’s Llama 3.1 70B stood out, capable of reproducing up to 42% of the text from the first Harry Potter book verbatim—a figure that significantly outpaces its peers. For comparison, the earlier Llama 1 65B model managed only 4.4% of the same text.

The study revealed a clear pattern: these models excel at memorizing popular works like Harry Potter, The Hobbit, and Orwell’s 1984, while struggling with lesser-known titles. This disparity suggests that widely read books, with their extensive training data exposure, leave a deeper imprint on AI systems.

The results challenge the assertions of AI companies that their models merely “learn patterns” rather than copying content. The research indicates that for certain works, memorization is not an anomaly but a systemic feature, undermining arguments that such practices fall under fair use. This raises significant ethical and legal questions about the training data used to develop these models.


Also read:


The paradox lies in the vulnerability of open-source models, which, due to their transparency, are more susceptible to scrutiny and potential lawsuits. Researchers can precisely measure the extent of text retention in models like Llama, exposing their memorization tendencies. In contrast, closed models from companies like OpenAI, Anthropic, and Google may face similar issues, but their opaque nature makes it harder to substantiate claims of copyright infringement. As the debate intensifies, this study highlights the need for clearer guidelines on AI training practices and intellectual property rights.


0 comments
Read more