Cloudflare introduced that they delisted Perplexity’s crawler as a verified bot and at the moment are actively blocking Perplexity and all of its stealth bots from crawling web sites. Cloudflare acted in response to a number of consumer complaints towards Perplexity associated to violations of robots.txt protocols, and a subsequent investigation revealed that Perplexity was utilizing aggressive rogue bot ways to power its crawlers onto web sites.
Cloudflare Verified Bots Program
Cloudflare has a system known as Verified Bots that whitelists bots of their system, permitting them to crawl the web sites which might be protected by Cloudflare. Verified bots should conform to particular insurance policies, resembling obeying the robots.txt protocols, in an effort to preserve their privileged standing inside Cloudflare’s system.
Perplexity was discovered to be violating Cloudflare’s necessities that bots abide by the robots.txt protocol and chorus from utilizing IP addresses that aren’t declared as belonging to the crawling service.
Cloudflare Accuses Perplexity Of Utilizing Stealth Crawling
Cloudflare noticed varied actions indicative of extremely aggressive crawling, with the intent of circumventing the robots.txt protocol.
Stealth Crawling Conduct: Rotating IP Addresses
Perplexity circumvents blocks by utilizing rotating IP addresses, altering ASNs, and impersonating browsers like Chrome.
Perplexity has an inventory of official IP addresses that crawl from a particular ASN (Autonomous System Quantity). These IP addresses assist establish respectable crawlers from Perplexity.
An ASN is a part of the Web networking system that gives a singular figuring out quantity for a bunch of IP addresses. For instance, customers who entry the Web through an ISP achieve this with a particular IP deal with that belongs to an ASN assigned to that ISP.
When blocked, Perplexity tried to evade the restriction by switching to completely different IP addresses that aren’t listed as official Perplexity IPs, together with totally completely different ones that belonged to a special ASN.
Stealth Crawling Conduct: Spoofed Person Agent
The opposite sneaky conduct that Cloudflare recognized was that Perplexity modified its consumer agent in an effort to circumvent makes an attempt to dam its crawler through robots.txt.
For instance, Perplexity’s bots are recognized with the next consumer brokers:
- PerplexityBot
- Perplexity-Person
Cloudflare noticed that Perplexity responded to consumer agent blocks by utilizing a special consumer agent that posed as an individual crawling with Chrome 124 on a Mac system. That’s a observe known as spoofing, the place a rogue crawler identifies itself as a respectable browser.
Based on Cloudflare, Perplexity used the next stealth consumer agent:
“Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36”
Cloudflare Delists Perplexity
Cloudflare introduced that Perplexity is delisted as a verified bot and that they are going to be blocked:
“The Web as we’ve got identified it for the previous three a long time is quickly altering, however one factor stays fixed: it’s constructed on belief. There are clear preferences that crawlers needs to be clear, serve a transparent function, carry out a particular exercise, and, most significantly, observe web site directives and preferences. Primarily based on Perplexity’s noticed conduct, which is incompatible with these preferences, we’ve got de-listed them as a verified bot and added heuristics to our managed guidelines that block this stealth crawling.”
Takeaways
- Violation Of Cloudflare’s Verified Bots Coverage
Perplexity violated Cloudflare’s Verified Bots coverage, which grants crawling entry to trusted bots that observe commonsense guidelines like honoring the robots.txt protocol. - Perplexity Used Stealth Crawling Ways
Perplexity used undeclared IP addresses from completely different ASNs and spoofed consumer brokers to crawl content material after being blocked from accessing it. - Person Agent Spoofing
Perplexity disguised its bot as a human consumer by posing as Chrome on a Mac working system in makes an attempt to bypass filters that block identified crawlers. - Cloudflare’s Response
Cloudflare delisted Perplexity as a Verified Bot and carried out new blocking guidelines to forestall the stealth crawling. - web optimization Implications
Cloudflare customers who need Perplexity to crawl their websites might want to test if Cloudflare is obstructing the Perplexity crawlers, and, if that’s the case, allow crawling through their Cloudflare dashboard.
Cloudflare delisted Perplexity as a Verified Bot after discovering that it repeatedly violated the Verified Bots insurance policies by disobeying robots.txt. To evade detection, Perplexity additionally rotated IPs, modified ASNs, and spoofed its consumer agent to look as a human browser. Cloudflare’s determination to dam the bot is a powerful response to aggressive bot conduct on the a part of Perplexity.

