AI Scraper Battle: Website Owners Defend Against Google and OpenAI, Revealing Blockage Rates

At the center of the world wide web lies a piece of code called Robots.txt. The main function of this site is to give website owners the chance to allow or disallow major tech companies such as Google to scrape their data. Most website owners have no problem with Google using their data because of the fact that this is the sort of thing that could potentially end up sending traffic to their site, but AI is a different matter entirely.

We are living in the midst of what many are calling the AI wars, with all the central tech companies including Google, Microsoft and Meta throwing their hat into the ring. Many website owners are wary of their content being scraped to help train AI and LLMs, and they can use robots.txt files to prevent this from happening.

In the case of Google, the tech juggernaut launched its own tool that people can use to this end. According to data released by OriginalityAI, 10.3% of the top 1,000 websites are using this snippet, which is referred to as Google-Extended. However, in the case of OpenAI, around 32.7% of the top 1,000 websites are using the tags to prevent GPTBot from scraping any of their content whatsoever.

Google trails as 10.3% of top websites limit data scraping, while OpenAI faces 32% resistance.

With all of that having been said and now out of the way, it is important to note that Google’s strength in the search engine game might have something to do with that. More sites would want to be included in any and all AI based search results that Google would compile at some point or another in the future.

It will be interesting to see where things go from here on out, since Google is clearly able to scrape far more data than its largest competitor. Websites owners know that AI based search results might be coming soon, and if they are excluded from these results due to their use of Google-Extended, they might not be able to get the same amount of foot traffic that they used to. Google is already testing a genAI search engine, and it might be releasing it sooner rather than later.

Read next: Keywords Are Crucial for SEO, Guiding Users to Relevant Content. Learn How to Leverage Them Effectively
Previous Post Next Post