Anthropic's ClaudeBot web crawler has been frequently accessing the iFixit website within a 24-hour period, seemingly violating the company's terms of use.
iFixit CEO Kyle Wiens stated that this action not only uses content without authorization but also consumes their development resources. In response to this issue, iFixit has added a crawl-delay extension in their robots.txt to limit crawler access.
In addition to iFixit, Eric Holscher, the co-founder of Read the Docs, and Matt Barrie, the CEO of Freelancer.com, also reported that their websites were being harassed by Anthropic's crawler.
Several months ago, Reddit posts reported a sharp increase in Anthropic's web scraping activities. In April of this year, a site outage on the Linux Mint web forum was also attributed to the scraping activities of ClaudeBot.
Many AI companies, like OpenAI, use robots.txt files to deny crawler access, but this does not provide website owners with the flexibility to define which scraping content is allowed and which is prohibited. Another AI company, Perplexity, has been found to completely ignore the robots.txt exclusion rules.
Despite this, it remains one of the few options for many companies to protect their data from being used as AI training materials, with Reddit recently taking action against web crawlers as well.