According to a report by Wired magazine, Amazon Web Services (AWS) is investigating AI search startup Perplexity AI for allegedly violating AWS service terms by scraping content from websites that attempted to block such actions.
Perplexity AI, backed by the Jeff Bezos Family Foundation and Nvidia, recently achieved a valuation of $3 billion. Wired discovered that the company appears to rely on scraping content from websites prohibited by the Robots Exclusion Protocol. The Robots Exclusion Protocol is a web standard used to indicate which pages should not be accessed by automated bots and crawlers. Although this protocol is not legally binding, most companies traditionally adhere to it.
An AWS spokesperson stated that the company's service terms prohibit customers from using its services for any illegal activities, and customers are responsible for complying with the terms and all applicable laws. AWS customers must adhere to the robots.txt standard when crawling websites.
The investigation found that Perplexity could access a server using undisclosed IP addresses, which had accessed assets owned by Condé Nast hundreds of times in the past three months, apparently for scraping prohibited content. Spokespeople for The Guardian, Forbes, and The New York Times also reported similar detections.
Perplexity CEO Aravind Srinivas claimed that the discovered scraping activities were conducted by a third-party company providing web scraping and indexing services, but refused to disclose the company's name. Perplexity spokesperson Sara Platnick stated that the company has responded to Amazon's inquiries and claims that its PerplexityBot respects robots.txt, but ignores the protocol when users input specific URLs.
Jason Kint, CEO of the digital content industry trade association Digital Content Next, believes that if the allegations against Perplexity are true, the company has violated several principles aimed at preventing potential copyright infringement. He emphasized that AI companies should not, by default, obtain and use publishers' content without permission.
Currently, this incident has sparked widespread attention and discussion about how AI companies obtain data. The industry awaits the publication of AWS's investigation results and any further actions that may be taken against Perplexity.