Powered by Smartsupp

Adobe Faces Lawsuit Over Alleged Use of Pirated Books to Train AI Models



By admin | Dec 18, 2025 | 2 min read


Adobe Faces Lawsuit Over Alleged Use of Pirated Books to Train AI Models

Following the broader industry trend, Adobe has significantly invested in artificial intelligence in recent years. Since 2023, the software company has introduced various AI services, such as Firefly, its suite for AI-generated media. This enthusiastic adoption of the technology may now have created a legal issue, as a new lawsuit alleges the company trained one of its AI models using pirated books.

A proposed class-action suit, filed on behalf of Oregon-based author Elizabeth Lyon, contends that Adobe utilized unauthorized copies of many books—including Lyon's own works—to train its SlimLM program. Adobe characterizes SlimLM as a series of compact language models designed to be "optimized for document assistance tasks on mobile devices." The company notes that SlimLM was pre-trained on SlimPajama-627B, described as a "deduplicated, multi-corpora, open-source dataset" released by Cerebras in June 2023.

Lyon, who has authored several non-fiction writing guides, states that some of her books were part of a pretraining dataset employed by Adobe. Her lawsuit, initially covered by Reuters, asserts that her writing was included in a processed subset of a manipulated dataset that formed the foundation of Adobe's system. The filing explains: "The SlimPajama dataset was created by copying and manipulating the RedPajama dataset (including copying Books3). Thus, because it is a derivative copy of the RedPajama dataset, SlimPajama contains the Books3 dataset, including the copyrighted works of Plaintiff and the Class members."

"Books3," an extensive collection of 191,000 books frequently used to train generative AI systems, has been a persistent source of legal challenges for the technology sector. The RedPajama dataset has also been referenced in multiple lawsuits. For instance, a September case against Apple alleged the company used copyrighted material to train its Apple Intelligence model, citing the dataset and accusing the firm of copying protected works "without consent and without credit or compensation." A similar lawsuit filed against Salesforce in October made comparable claims regarding RedPajama's use for training.

Such legal actions have become increasingly common for the tech industry. AI algorithms require vast datasets for training, and some of these collections have been accused of containing pirated materials. In a notable September settlement, Anthropic agreed to pay $1.5 billion to a group of authors who had sued, alleging the company used pirated versions of their work to train its Claude chatbot. This case was viewed as a potential milestone in the numerous ongoing legal disputes concerning copyrighted material within AI training data.




Comments

Please log in to leave a comment.

No comments yet. Be the first to comment!