Recently, a Ukrainian website focused on 3D human models, Trilegangers, experienced an unprecedented traffic attack that caused its servers to crash. The site aims to provide a vast amount of 3D model data for artists and game developers, but it found itself in trouble due to frequent crawling by OpenAI's bot, GPTBot.

According to Trilegangers staff, despite the website's terms of use clearly prohibiting unauthorized crawling and usage, the failure to properly configure the robots.txt file to block bot access ultimately led to server overload. Server logs indicate that OpenAI's GPTBot initiated tens of thousands of requests from over 600 different IP addresses, resulting in the site becoming inoperable, similar to a distributed denial-of-service (DDoS) attack.

image.png

OpenAI mentioned in its bot documentation that if a website does not want GPTBot to crawl its content, it needs to configure its robots.txt file accordingly. However, Trilegangers was unaware of this, leading to the current predicament. Although the robots.txt file is not a legal requirement, if a website has declared a prohibition on unauthorized use, the crawling behavior of GPTBot may still violate relevant regulations.

Additionally, due to using Amazon AWS servers, Trilegangers has seen a dramatic increase in bandwidth and traffic consumption, putting additional financial pressure on the site. In response to this unexpected incident, Trilegangers has taken measures by setting up the correct robots.txt file and blocking access from various bots, including GPTBot, through Cloudflare. This approach is expected to effectively alleviate server load and ensure the normal operation of the website.

This incident has raised awareness about the behavior of web crawlers, especially in the context of the growing development of AI technology. How to balance technological applications with copyright protection has become a topic worthy of deep reflection.