Top news websites across ten countries are actively blocking crawlers deployed by OpenAI and Google, revealing significant trends in web content accessibility and AI regulation.
According to a recent study from the Reuters Institute, nearly half of the leading news websites across ten countries have implemented blocks against crawlers deployed by OpenAI, with a quarter also blocking Google's AI crawler.
The analysis, which inspected robots.txt files of 15 prominent online news sources including The New York Times, NPR, BuzzFeed News, and CNN, underscores the challenges faced by AI crawlers amid the absence of clear regulatory frameworks governing their usage of copyrighted material.
Legacy Print Lead AI Crawler Blockage
The study classified outlets into legacy print publications, television and radio broadcasters, and digital-born outlets.
Legacy print publications exhibited the highest propensity to block AI crawlers, with 57% restricting OpenAI and 32% blocking Google's crawler.
This trend was less pronounced among broadcasters and digital-born outlets.
The research found that only 48% of television and radio broadcasters, and 31% of digital outlets have taken action against OpenAI's crawlers.
The trend follows suit for Google's crawlers, with 19% of broadcasters and 17% of digital-born outlets blocking them.
"The Reuters study highlights a fundamental challenge for generative AI: its dependence on authentic content generated by real people who see it as a threat to their livelihoods," Gartner VP Analyst Andrew Frank noted.
How Different Countries Tackle AI Crawlers
The study revealed disparities between news outlets in the Global North and Global South regarding the blocking of AI crawlers.
In the U.S., a substantial 79% of top online news websites blocked OpenAI, contrasting with only 20% in Mexico and Poland.
Similarly, in Germany, 60% of news sites blocked Google's crawlers compared to merely 7% in Poland and Spain.
OpenAI debuted its crawlers in August, followed by Google in September. No website reversed its decision to block access to either crawler once the action was taken.