Creative Commons Backs "Pay-to-Crawl" System to Compensate Websites for AI Training Data

By admin | Dec 16, 2025 | 3 min read

Earlier this year, the nonprofit Creative Commons introduced a framework for an open AI ecosystem. Now, the organization has expressed cautious support for "pay-to-crawl" technology—a system designed to automate compensation for website content when it is accessed by machines such as AI web crawlers. Creative Commons is widely recognized for pioneering the licensing movement that enables creators to share their work while maintaining copyright. In July, the organization revealed a plan to establish a legal and technical framework for sharing datasets between data-controlling companies and the AI providers seeking to train on that data. Currently, the nonprofit is tentatively endorsing pay-to-crawl systems, describing its position as "cautiously supportive."

A blog post from Creative Commons stated, "Implemented responsibly, pay-to-crawl could represent a way for websites to sustain the creation and sharing of their content, and manage substitutive uses, keeping content publicly accessible where it might otherwise not be shared or would disappear behind even more restrictive paywalls." The concept of pay-to-crawl, championed by companies like Cloudflare, involves charging AI bots each time they scrape a site to gather content for model training and updates. Historically, websites freely permitted web crawlers to index their content for inclusion in search engines like Google. This arrangement benefited sites by ensuring their inclusion in search results, which drove visitor traffic and clicks.

However, the rise of AI technology has altered this dynamic. When a consumer obtains an answer through an AI chatbot, they are less likely to click through to the original source. This shift has already had a devastating impact on publishers by reducing search traffic, with no signs of abating. In contrast, a pay-to-crawl system could help publishers recover from the financial blow dealt by AI. Additionally, it could prove more advantageous for smaller web publishers that lack the leverage to negotiate individual content deals with AI providers. Major agreements have already been reached between companies such as OpenAI and Condé Nast, Axel Springer, and others; Perplexity and Gannett; Amazon and The New York Times; and Meta and various media publishers, among others.

Creative Commons included several caveats in its support for pay-to-crawl, noting that such systems could consolidate power on the web. They might also potentially restrict access to content for "researchers, nonprofits, cultural heritage institutions, educators, and other actors working in the public interest." The organization proposed a series of principles for responsible pay-to-crawl implementation, including avoiding default pay-to-crawl settings for all websites and steering clear of blanket rules for the web. Furthermore, it recommended that pay-to-crawl systems should allow for throttling rather than just blocking and should preserve public interest access. These systems should also be open, interoperable, and built with standardized components.

Cloudflare is not the only company investing in the pay-to-crawl space. Microsoft is developing an AI marketplace for publishers, and smaller startups like ProRata.ai and TollBit have also entered the field. Another group, the RSL Collective, announced its own specification for a new standard called Really Simple Licensing (RSL), which would define what parts of a website crawlers could access without actually blocking them. Cloudflare, Akamai, and Fastly have since adopted RSL, which is supported by Yahoo, Ziff Davis, O’Reilly Media, and others. Creative Commons also declared its support for RSL, alongside its broader CC signals project aimed at developing technology and tools for the AI era.