Recently, during an interview, Steve Huffman, CEO of Reddit, revealed that the company is seeking data usage agreements with major tech firms, requiring those who wish to continue scraping Reddit data to pay for it. This move stems from agreements already in place with Google and OpenAI, with Huffman hoping other companies will follow suit.
Huffman specifically called out Microsoft, Anthropic, and Perplexity for refusing to negotiate over data usage, stating that "blocking these companies is quite troublesome." He noted that without relevant agreements, Reddit has no control or insight into how its data is used, forcing the company to block those unwilling to accept the terms.
In response to this situation, Reddit has tightened restrictions on web crawlers in recent months. Early July saw the company update its robots.txt file to block crawlers from accessing data without a signed agreement. Subsequently, users noticed that Reddit content only appeared in Google search results with an agreement, while it vanished from other search engines like Bing.
Huffman criticized Microsoft for using Reddit data to train AI without authorization and selling the content through the Bing API to other search engines. He cited the Microsoft AI CEO's statement that public data on the internet is "freeware." Huffman believes this viewpoint reflects the stance of some tech companies towards internet content.
Regarding the disappearance of Reddit content from Bing, Microsoft Search Director Jordi Ribas explained that this was due to Reddit blocking Bing from crawling its site. A Microsoft spokesperson emphasized that the company respects the directives of website providers regarding content usage.
Huffman pointed out that the value exchange model of traditional search engines has evolved. With the convergence of search, summarization, and AI training, the simple model of exchanging content for traffic has become more complex. He stated that Reddit, along with traditional media publishers, is exploring a paid model for providing information to generative AI.
In response, Anthropic stated that it has blacklisted Reddit for crawlers, respecting its robots.txt settings. Microsoft declined to comment on the matter, and Perplexity did not respond to requests for comment.
This controversy highlights the complexities of content value and usage rights in the digital age, also suggesting potential new collaboration models between tech companies and content providers.