As I pointed out a few weeks ago: The web, as it's designed today, is not ready for “Agents” of any kind—AI-driven or just plain old automation scripts. Why? Because there’s no agreed-upon way for machines to interact with websites on behalf of a user.

Case in point: Paywalls.

Publishers are getting more creative in protecting their content from scraping, and rightly so: no one wants their work stolen by AI companies or repackaged by Google. But at the same time, they want to provide a good user experience for those who pay.

Some of the quirkiest hacks I’ve seen:

🤯 DerStandard.at: Uses JavaScript to render content only in full browsers; the content isn’t in the raw HTML.

🤯 NYTimes.com: Possibly the strictest—blocks most scrapers and often even browsers unless you solve a CAPTCHA.

🤯 TheGuardian.com: Easy to scrape the article, but not the headline image—image URLs require hash-based authorization.

poketto.me is built on the idea that you, the user, should have the choice when, where, and in which format you want to consume web content. So how am I handling the scraping situation?

➡️Currently: I’m only scraping what’s “easily” accessible. For most websites, that’s “good enough.” Sometimes that even allows one to peek behind a poorly implemented paywall.

➡️Possibly next: A Selenium-based solution for complex cases. Simulating a “real” browser would get poketto.me access also in tricker cases – like DerStandard.at

➡️Maybe later: Enhancing the Chrome extension to fetch content from your local browser session (e.g., when you’re already logged into NYT).

But the real fix? What we need is a protocol like this:

➡️ A paying user has a subscription
➡️ They grant poketto.me access via credentials
➡️ poketto.me fetches content on their behalf, with proper authentication

Thus, the user gets what they paid for (the content) and the choice when and where to consume it. The publisher gets paid. And poketto.me facilitates the transaction. But: That’s a long way off, still. Cloudflare has recently introduced a “pay per crawl” option—let’s see if that develops in the right direction!