Cloudflare accuses AI startup Perplexity of evading anti-scraping measures

According to Cloudflare, Perplexity’s crawlers were observed accessing tens of thousands of domains, generating millions of requests daily, even when sites explicitly blocked AI bots

author-image
BestMediaInfo Bureau
New Update
Web crawler
Listen to this article
0.75x 1x 1.5x
00:00 / 00:00

New Delhi: Cloudflare has accused AI startup Perplexity of bypassing website anti-scraping protections

Cloudflare alleged that Perplexity’s web crawlers are systematically ignoring "Robots.txt" directives, a standard protocol used by websites to restrict automated access, and employing sophisticated tactics to scrape content without permission.

According to Cloudflare, Perplexity’s crawlers were observed accessing tens of thousands of domains, generating millions of requests daily, even when sites explicitly blocked AI bots. 

The company claims Perplexity disguised its crawlers by rotating user-agent strings and altering IP addresses and Autonomous System Numbers (ASNs) to mimic legitimate browsers, such as Google Chrome on macOS, to evade detection. “This activity was observed across a massive scale,” Cloudflare stated, noting that it confirmed the behavior through controlled testing after receiving complaints from customers whose websites were scraped despite restrictions.

Perplexity, an AI-powered search engine backed by high-profile investors like Jeff Bezos and Nvidia, has denied the allegations. Spokesperson Jesse Dwyer dismissed Cloudflare’s report as a “sales pitch” and claimed the crawler identified in the report “isn’t even ours.” 

In a follow-up statement to a news publication, Dwyer argued that screenshots provided by Cloudflare did not show actual content access. Cloudflare, however, doubled down, asserting that its machine learning and network signal analysis definitively linked the activity to Perplexity.

In response, Cloudflare has removed Perplexity from its list of verified crawlers and introduced new tools to help website owners block unauthorized AI bots. The company also recently launched a “Pay Per Crawl” marketplace, allowing publishers to charge AI companies for data access, and a free tool to block bots used for AI training

This is not Perplexity’s first brush with controversy. In 2023, the startup faced accusations of plagiarising content and bypassing paywalls, with media outlets.

content crawler Perplexity AI
Advertisment