Social media platform Reddit sued the factitious intelligence firm Perplexity AI and three different entities on Wednesday, alleging their involvement in an “industrial-scale, illegal” economic system to “scrape” the feedback of tens of millions of Reddit customers for business acquire.
Reddit’s lawsuit in a New York federal court docket takes intention at San Francisco-based Perplexity, maker of an AI chatbot and “reply engine” that competes with Google, ChatGPT, and others in on-line search.
Additionally named within the lawsuit are Lithuanian data-scraping firm Oxylabs UAB, an online area known as AWMProxy that Reddit describes as a “former Russian botnet,” and Texas-based startup SerpApi, which lists Perplexity as a buyer on its web site.
It’s the second such lawsuit from Reddit because it sued one other main AI firm, Anthropic, in June.
However the lawsuit filed Wednesday is totally different in the best way that it confronts not simply an AI firm however the lesser-known providers the AI trade depends on to accumulate on-line writings wanted to coach AI chatbots.
“Scrapers bypass technological protections to steal knowledge, then promote it to shoppers hungry for coaching materials. Reddit is a main goal as a result of it’s one of many largest and most dynamic collections of human dialog ever created,” mentioned Ben Lee, Reddit’s chief authorized officer, in an announcement Wednesday.
Perplexity mentioned it has not but obtained the lawsuit however “will all the time struggle vigorously for customers’ rights to freely and pretty entry public data. Our strategy stays principled and accountable as we offer factual solutions with correct AI, and we is not going to tolerate threats towards openness and the general public curiosity.”
SerpApi’s buyer success director, Ryan Schafer, mentioned in an electronic mail: “We strongly disagree with Reddit’s allegations and intend to vigorously defend ourselves in court docket.”
Oxylabs didn’t instantly reply to a request for remark Wednesday. AWMProxy couldn’t instantly be reached for remark.
Reddit compares the businesses it’s suing to “would-be financial institution robbers” who can’t get into the financial institution vault, in order that they break into the armored truck as an alternative. The lawsuit alleges they’re evading Reddit’s personal anti-scraping measures whereas additionally ”circumventing Google’s controls and scraping Reddit content material instantly from Google’s search engine outcomes.”
Lee mentioned that as a result of they’re unable to scrape Reddit instantly, “they masks their identities, conceal their places, and disguise their internet scrapers to steal Reddit content material from Google Search. Perplexity is a prepared buyer of not less than considered one of these scrapers, selecting to purchase stolen knowledge moderately than enter right into a lawful settlement with Reddit itself.”
Reddit made an analogous argument in its lawsuit towards Anthropic, alleging that the corporate ignored Reddit’s appeals to stop utilizing its content material. That case was initially filed in California Superior Courtroom however was later moved to federal court docket and has a listening to scheduled for January.
Together with digitized books and information articles, web sites corresponding to Wikipedia and Reddit are deep troves of written supplies that may assist train an AI assistant the patterns of human language.
Reddit has beforehand entered licensing agreements with Google, OpenAI, and different firms which might be paying to have the ability to prepare their AI methods on the general public commentary of Reddit’s greater than 100 million every day customers.
The licensing offers helped the 20-year-old on-line platform elevate cash forward of its Wall Avenue debut as a publicly traded firm final yr.
—Matt O’Brien, AP know-how author

