Skip to content Skip to sidebar Skip to footer

Perplexity Accused of Ignoring Website Blocks and Scraping Content via Stealth Crawlers

Perplexity Accused of Ignoring Website Blocks to Scrape Content

Cloudflare says Perplexity’s AI crawlers kept pulling data from websites even after those sites tried to block them. On Monday, the internet infrastructure company cut Perplexity from its verified bot program and set up new blocks, calling the AI firm’s scraping methods deceptive.

The issue started when Cloudflare customers noticed something odd. Despite using robots.txt files and firewall rules to stop Perplexity’s crawlers, their content was still being scraped. Cloudflare engineers ran tests and confirmed it—Perplexity wasn’t backing off.

How Cloudflare Tested the Claims

To see how far Perplexity would go, Cloudflare bought new domains with strict robots.txt files that banned all automated access. Then they asked Perplexity AI questions about those domains. Sure enough, Perplexity still spat back detailed answers about the blocked content.

But here’s where it gets weirder. When blocked, Perplexity didn’t just give up. According to Cloudflare, the company switched to using a hidden crawler disguised as Google Chrome on macOS. This crawler cycled through unlisted IP addresses and even different network providers to dodge detection.

The Scale of the Problem

Cloudflare estimates Perplexity’s official crawlers make 20-25 million requests per day. The stealthy ones? Another 3-6 million. That’s tens of thousands of domains hit daily, all while trying to evade blocks.

Perplexity hasn’t said much. A spokesperson brushed off Cloudflare’s claims as a “sales pitch” to TechCrunch. But Cloudflare’s CEO, Matthew Prince, isn’t holding back. He’s been critical of AI companies vacuuming up web content without giving much back.

The numbers are rough. Google sends one visitor for every 18 pages it crawls. OpenAI? Now at 1,500-to-1. Anthropic’s ratio is even worse—60,000-to-1. Prince calls it unsustainable.

What’s Next

Cloudflare’s response has two parts: short-term fixes and long-term plans. They’ve already updated their systems to catch Perplexity’s sneaky crawlers. Free users get the protection too.

Longer term, they’re working on an “AI Labyrinth” to waste non-compliant bots’ time with fake content. There’s also talk of a “pay-per-crawl” system where publishers could charge AI companies for access.

For now, Cloudflare’s new default is blocking AI crawlers on all new domains. Over a million sites have already opted in, including big names like The Atlantic, Reddit, and Universal Music Group.

The message is clear: if crawlers want respect, they’ll need to play by the rules. And right now, Cloudflare says Perplexity isn’t.

Loading