Creative Commons Considers Support for AI Crawl Systems

Creative Commons Supports “Pay-to-Crawl” Technology for AI Access
The nonprofit organization Creative Commons has expressed support for “pay-to-crawl” technology, following the announcement of a framework for an open AI ecosystem earlier this year.
Understanding Creative Commons
Creative Commons (CC) is widely recognized for its role in establishing the licensing movement. This movement empowers creators to share their work while maintaining their copyright.
In July, CC unveiled a plan to create both a legal and technical infrastructure for data sharing. This would facilitate interactions between companies possessing data and AI providers seeking to utilize it for training purposes.
Cautious Endorsement of Pay-to-Crawl
The organization is now tentatively endorsing pay-to-crawl systems, describing its position as “cautiously supportive.”
According to a CC blog post, responsible implementation of pay-to-crawl could enable websites to sustain content creation and sharing. It could also manage substitutive uses, ensuring content remains publicly accessible, even when faced with more restrictive paywalls.
How Pay-to-Crawl Works
The concept, championed by companies such as Cloudflare, involves charging AI bots each time they scrape a website. This scraping is done to gather content for model training and updates.
Historically, websites permitted web crawlers to index their content freely for inclusion in search engines like Google. This arrangement benefited sites through increased visibility in search results, driving traffic and user engagement.
The Shift with AI Technology
However, the advent of AI technology has altered this dynamic. When users receive answers directly from AI chatbots, they are less likely to click through to the original source website.
This change has already negatively impacted publishers by reducing search traffic, and the trend is expected to continue.
Potential Benefits for Publishers
A pay-to-crawl system could potentially help publishers recoup losses caused by AI’s impact on their revenue. It could also prove beneficial for smaller web publishers who lack the negotiating power to secure individual content deals with AI providers.
Significant agreements have already been reached between AI companies and major publishers, including OpenAI and Condé Nast, Axel Springer, Perplexity and Gannett, Amazon and The New York Times, and Meta with various media outlets.
Caveats and Considerations
CC has outlined several caveats regarding its support for pay-to-crawl. The organization notes that such systems could lead to a concentration of power on the web.
Furthermore, it could potentially restrict access to content for researchers, nonprofits, cultural heritage institutions, educators, and other entities serving the public interest.
Principles for Responsible Implementation
CC has proposed a set of principles for responsible pay-to-crawl implementation:
- Pay-to-crawl should not be the default setting for all websites.
- Avoidance of blanket rules applicable across the entire web.
- Systems should allow for throttling, not just complete blocking of access.
- Preservation of access for public interest organizations.
- Systems should be open, interoperable, and built using standardized components.
Other Players in the Pay-to-Crawl Space
Cloudflare is not the sole company exploring pay-to-crawl. Microsoft is also developing an AI marketplace for publishers.
Additionally, startups like ProRata.ai and TollBit have entered the market. The RSL Collective has also introduced a new standard, Really Simple Licensing (RSL), which defines accessible website sections for crawlers without outright blocking them.
Cloudflare, Akamai, and Fastly have adopted RSL, with support from Yahoo, Ziff Davis, O’Reilly Media, and others.
CC’s Support for RSL
CC has also voiced its support for RSL, alongside its broader project, CC signals, focused on developing technology and tools for the AI era.





