Reddit is taking action to prevent AI companies from scraping its content, or at least requiring them to pay for it.
Earlier this week, Reddit announced it is changing its bot exclusion protocol, also known as its robots.txt file. This seemingly mundane edit is part of a larger negotiation/battle between content owners that AI companies are eager to use for training their language models.
Image source note: The image is AI-generated, provided by the image licensing service Midjourney.
"Robots.txt" is a file used by websites to communicate to third parties how they can be crawled. A classic example is websites allowing Google to crawl them for inclusion in search results.
In the context of artificial intelligence, the exchange of value is not as clear-cut. When your website's business model involves attracting clicks and eyeballs, allowing AI companies to extract your content without sending any traffic (and in some cases, directly plagiarizing your work) is not appealing.
Therefore, by altering its robots.txt file and continuing to impose rating restrictions and blocks on unknown bots and crawlers, Reddit appears to be working to prevent companies like Perplexity AI from engaging in criticized practices.
Key Points:
- 📢 Reddit is taking action to prevent AI companies from scraping its content, or at least requiring them to pay for it.
- 🤖 Robots.txt is a file used by websites to communicate to third parties how they can be crawled. A classic example is websites allowing Google to crawl them for inclusion in search results.
- 💻 Reddit is changing its robots.txt file and continuing to impose rating restrictions and blocks on unknown bots and crawlers to prevent companies like Perplexity AI from engaging in criticized practices.