OpenAI Sheds Lights On Its New Web Crawler And How Users Can Greatly Benefit

OpenAI has recently rolled out a long list of details regarding its latest web crawler and the features it entails.

The parent firm of ChatGPT says the new rollout is called GPTBot and it’s designed to crawl the world wide web and take in knowledge related to some leading AI offerings like ChatGPT. Similarly, it can produce more replies based on AI technology to queries as well as prompts sent in by the user.

The user agent is given the token of GPTBot and the robots.txt can be used to block any GPTBot from attaining access to a website or some components regarding it. Moreover, it would not enable GPTBot from getting access to the website through a simple maneuver that entails the addition of GPTBot when typing the webpage’s robots.txt. Hence, that would work great at disallowing.

But if you would like to allow the GPTBot to attain access to some parts of the webpage, you may simply include a GPTBot token to a page with robots.txt such as this:
User-agent: GPTBot

Allow: /directory-1/

Disallow: /directory-2/
And for more details on the front, you can feel free to the documentation provided by OpenAI regarding the GPTBot. Meanwhile, ChatGPT’s parent firm also rolled out a long list of IP ranges which includes the uses for this GPTBot. It only entails a single range but with time, that is also expected to further rise.

Why is this news important is a question on many people’s minds. And the answer is simple. Disallowing the web crawler from reaching your website means not giving it permission to make use of your data in any way, shape, or form. And it’s very much like blocking any other web crawler from doing the same.


Read next: A Growing Threat: How AI Poses Risks to Cybersecurity in 2023
Previous Post Next Post