Reddit recently announced plans to strengthen its data protection measures, directly targeting AI companies and other data scraping tools. This move signifies the growing tension between social media platforms and the AI industry.
Reddit intends to update its robots exclusion protocol (robots.txt file) to prevent unauthorized platforms from automated scraping. The company spokesperson emphasized that this update is not aimed at specific companies but is intended to "protect Reddit while maintaining the openness of the internet." Reddit stated that these changes will not affect "honest actors," such as the Internet Archive and researchers.
Image source note: The image was generated by AI, provided by the image licensing service Midjourney
This action seems to be a response to recent reports about AI companies, such as Perplexity, bypassing the website's robots.txt protocol. Perplexity's CEO has controversially described the protocol as "not a legal framework" in an interview with Fast Company, sparking debates over AI companies' data acquisition practices.
Reddit's stance is clear: any company using automated agents to access its platform must comply with its terms and policies and communicate with Reddit. This might hint at Reddit's desire to establish licensing agreements with AI companies similar to those it has with Google and OpenAI.
This is not the first time Reddit has taken a tough stance on data access issues. Last year, the company began charging AI companies for API usage and entered into licensing agreements with some AI companies, allowing them to use Reddit's data for model training. These agreements have become a significant source of revenue for Reddit.
Reddit's move reflects the balance social media platforms must strike between protecting user-generated content and seeking new revenue models. With the rapid development of AI technology, similar data access disputes may emerge on other platforms, sparking broader discussions about data ownership, usage, and value distribution.