ArXiv Tightens Rules on AI-Generated Research to Curb Careless Use of LLMs in Scientific Papers

By admin | May 16, 2026 | 2 min read

ArXiv, the widely used open repository for preprint research, is tightening its rules against the careless use of large language models in scientific papers. While papers are posted to the site before undergoing peer review, arXiv (pronounced “archive”) has become a key platform for circulating research in fields like computer science and mathematics, and the site itself serves as a data source for tracking trends in scientific research. The repository has already taken measures to address the rising tide of low-quality, AI-generated papers, such as requiring first-time contributors to obtain an endorsement from an established author. After being hosted by Cornell for over two decades, the organization is now transitioning into an independent nonprofit, a move expected to help it raise more funds to tackle issues like AI-generated content.

In its latest action, Thomas Dietterich, chair of arXiv’s computer science section, announced on Thursday that “if a submission contains incontrovertible evidence that the authors did not check the results of LLM generation, this means we can’t trust anything in the paper.” Such evidence may include “hallucinated references” or comments directed to or from the LLM, Dietterich explained. If this evidence is found, the paper’s authors will face “a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted by a reputable peer-reviewed venue.”

This rule does not outright ban the use of LLMs, but rather emphasizes, as Dietterich put it, that authors must take “full responsibility” for the content, “irrespective of how the contents are generated.” So if researchers directly copy-paste “inappropriate language, plagiarized content, biased content, errors, mistakes, incorrect references, or misleading content” from an LLM, they remain accountable for it. Dietterich told 404 Media that this will be a “one-strike” rule, though moderators must first flag the issue and section chairs must confirm the evidence before imposing the penalty. Authors will also have the opportunity to appeal the decision. Recent peer-reviewed research has shown that fabricated citations are on the rise in biomedical research, likely due to LLMs—though it’s worth noting that scientists are not the only ones caught using citations invented by AI.