Major Publishers Sue OpenAI Over "Massive Copyright Infringement" of Encyclopedia Content

By admin | Mar 16, 2026 | 2 min read

Encyclopedia Britannica and Merriam-Webster have initiated legal proceedings against OpenAI, claiming in their filing that the artificial intelligence leader is responsible for "massive copyright infringement."

The publisher, which owns Merriam-Webster, holds copyrights to approximately 100,000 online articles. The lawsuit contends that OpenAI scraped and utilized this content to train its large language models without obtaining permission. Furthermore, Britannica alleges OpenAI breaches copyright law when its systems produce outputs that include "full or partial verbatim reproductions" of Britannica's material, and when the AI company employs these articles within ChatGPT's retrieval augmented generation (RAG) workflow. This RAG tool is the mechanism by which the language model searches the web or other databases for current information to answer user queries.

Britannica also asserts that OpenAI infringes the Lanham Act, a trademark law, by generating fabricated "hallucinations" and incorrectly attributing them to the publisher. The legal complaint states, "ChatGPT starves web publishers like [Britannica] of revenue by generating responses to users’ queries that substitute, and directly compete with, the content from publishers like [Britannica]." It further argues that ChatGPT's inaccuracies endanger "the public’s continued access to high-quality and trustworthy online information."

Britannica is now part of a growing group of publishers and authors taking legal action against OpenAI over copyright concerns. Other plaintiffs include The New York Times, Ziff Davis—owner of outlets such as Mashable, CNET, IGN, and PC Magazine—along with more than a dozen newspapers across the United States and Canada, like the Chicago Tribune, the Denver Post, the Sun-Sentinel, the Toronto Star, and the Canadian Broadcasting Corporation. A related lawsuit filed by Britannica against Perplexity remains unresolved.

Currently, there is no definitive legal precedent determining whether using copyrighted material to train a large language model constitutes infringement. In a notable case, Anthropic successfully persuaded federal judge William Alsup that such use of content as training data is sufficiently transformative to be considered legal. However, Judge Alsup concluded that Anthropic had broken the law by illegally downloading millions of books instead of purchasing them, leading to a $1.5 billion class action settlement for the affected authors.