Ai on Build in Public

The memory consumption patterns of LangChain are… disturbing

Fri, 26 Sep 2025 00:00:00 +0000

As I said in No, you don’t have to learn LangChain, we shouldn’t get distracted by the artificial complexity introduced by our frameworks. LangChain is mostly a wrapper around the REST APIs of various LLM providers. Useful? Yes—switching between models becomes easy.

But here’s a mystery I can’t explain.

When I added Gemini as a fallback to DeepSeek (see yesterday’s post about DeepSeek refusing to touch Chinese politics), I thought it would be straightforward:

DeepSeek really won’t touch anything related to Chinese politics

Thu, 25 Sep 2025 00:00:00 +0000

For most use cases in poketto.me, I’m pretty happy with #DeepSeek: it’s cheap, reliable, and the output quality matches any other LLM I’ve tried.

But there’s one big caveat: anything related to Chinese politics can trigger an immediate refusal. Example: Right after the launch of poketto.me, a user tried saving an article about the September 3rd Beijing meeting between Xi Jinping, Vladimir Putin, Kim Jong Un, et al. ( https://orf.at/stories/3404330/)

The Gemini API for Video Understanding is surprisingly good

Sun, 21 Sep 2025 00:00:00 +0000

As I mentioned in Gemini’s URL Context feature is 90% hype, 10% value, I was pretty disappointed with Gemini’s “URL Context” feature. But “Video Understanding”? That one actually works like a charm.

How it works:

👉 Provide a YouTube video link
👉 Ask Gemini questions about the video
👉 Get a structured response back

For poketto.me, this unlocks a really neat feature: users can save any YouTube video in the app and either watch it later or read a textual description of the video.

Gemini’s “URL Context” feature is 90% hype, 10% value

Sat, 20 Sep 2025 00:00:00 +0000

I’ll admit—I was pretty excited when Google announced that the Gemini API would support a new “URL Context” tool. The idea: you could “ask” Gemini about the content of a specific web page, with Google handling all the heavy lifting.

The documentation even shows a neat example: send Gemini two recipe URLs and prompt it to compare ingredients and cooking times. If it worked, this would’ve been a game-changer for poketto.me:

Multi-threaded TTS: A bad idea

Tue, 26 Aug 2025 00:00:00 +0000

Running text-to-speech in the cloud is fun—until it isn’t.

Early on, I didn’t think much about thread safety. During my own testing, rarely would more than one TTS task be running in parallel, so there were no big issues. But once more users started using the feature, strange bugs popped up:

Errors like “Assertion srcIndex < srcSelectDimSize failed” started showing up in the logs—and worse, once triggered, the entire Cloud Run instance would become unusable until a redeploy.

AI is not a value proposition

Sat, 23 Aug 2025 00:00:00 +0000

I didn’t coin this phrase (sadly), but it keeps proving itself true—especially now that I’m working on GTM details for some of the more advanced features in poketto.me.

Most users don’t care how your app works. They care what it does for them—and whether that’s worth paying for.

Since LLMs became easy to embed, companies started slapping “powered by AI” stickers on everything as if that alone justified a price tag. But unless the user clearly feels the value, it doesn’t matter what's under the hood. Case in point: Garmin’s hilariously underwhelming $7/month “AI subscription”. The so-called “insights” offered nothing users couldn’t deduce themselves—or the app couldn’t have generated with much simpler logic.

Prompt engineering: A task best left to the machines

Tue, 19 Aug 2025 00:00:00 +0000

Under the hood, poketto.me makes heavy use of LLMs. The podcast feature is a great example: Users can turn any web content into a podcast, but often that content isn’t well-suited for listening. LLMs are great at optimizing this—simplifying complex sentences, turning headlines into enumerations, describing images verbally, etc.

But the challenge: How do you craft a single, generic prompt that works across all types of content and runs unsupervised via the API?

LLM-Based Translations: The Good, the Bad, and the Ugly

Mon, 04 Aug 2025 00:00:00 +0000

Automatic content translation has been a key feature of poketto.me from day one. Why? Because I believe there’s immense value in making content accessible to non-native speakers.

Personally, I’m deeply interested in developments in countries like India, Pakistan, and China — but the best publications from those regions often don’t publish in English. Being able to read and compare both Dawn News (Pakistan) and the Hindustan Times (India) coverage of tensions between the two countries — in English — for example is fascinating.

No, AI will not take McKiney or BCG out of business any day soon

Thu, 31 Jul 2025 00:00:00 +0000

Despite what the “God of Prompt” (sic!) or any other self-proclaimed “AI expert” is trying to tell you, none of the current AI models will replace a multi-hundred-thousand-dollar product strategy project.

First of all, the people making these claims are, most likely, just trying to sell you their overpriced list of “magic” prompts — and hoping for endorsement from the big AI companies or a retweet from Elon Musk.

But giving the AI tools the benefit of the doubt, I tried using Grok, ChatGPT, and Claude to iterate on a commercial strategy for poketto.me. The results were… disappointing. Here are the main issues:

Good things come to those who wait ⏳

Tue, 29 Jul 2025 00:00:00 +0000

Remember when I was complaining about how hard it is to run even basic ML workloads on GCP? Turns out, Google has listened 😊 (well, probably not to me personally, but in general).

You can now request GPUs for Cloud Run instances in the UI as well as on the command line. That means all the hassle I went through deploying my text-to-speech service into a Docker environment running inside a preemptible VM with GPUs—and then figuring out how to start, stop, and deploy the VM automatically—was… well, not exactly wasted, but at least: not necessary anymore.

For non-urgent LLM tasks, DeepSeek has offers great value for money

Sat, 26 Jul 2025 00:00:00 +0000

AI is not at the core of what poketto.me does, but it helps a lot: I’m using LLMs to translate saved content and to smooth out formatting issues (especially with PDF content). Any old LLM can do these things quite well, but when it comes to pricing, none beats DeepSeek.

When using their API, processing a million input tokens can be as cheap as $0.035, and a million output tokens will cost you at most $1.10. To give you an example: A typical 1,500-word essay will come down to about 2,000 tokens (input and output combined).

Never trust ChatGPT

Fri, 25 Jul 2025 00:00:00 +0000

I may sound like a broken record on this, but I’ve seen it over and over again while working with AI tools on poketto.me: Don’t trust the chatbots. Ever.

ChatGPT in particular has two immense problems: sycophancy and accuracy.

Regarding the former: It’s trying to please you—the user—to the point where it feels like every response is prefaced with a compliment that’s only designed to keep you engaged. Some examples?

No, you don’t have to learn LangChain

Mon, 21 Jul 2025 00:00:00 +0000

...or LangGraph, or LlamaIndex, or RAG, or whatever new AI-hype framework is trending this week in order build an AI-powered app.

More often than not, these frameworks are just wrappers around basic functionality—in this case, calling an API. And the layers of abstraction they introduce can make even simple things (“prompt an LLM”) feel unnecessarily complex.

Take RAG, for example. All it really does is frontload your prompt with additional context. That’s it. In practice, it boils down to concatenating a few strings—something you can do in five lines of code. But LangChain adds layer upon layer of custom methods, config objects, routing logic, etc., that often just get in the way.

Running text-to-speech in the #Cloud is harder than you would think (part three)

Thu, 10 Jul 2025 00:00:00 +0000

So, after finally setting up a dedicated virtual machine (VM) to run my text-to-speech workloads and wiring up all the build and deployment scripts, I got a bit excited. Could I reduce the TTS latency even further if the VM had GPU power?

In theory: Yes. In practice: Google doesn't give you access to their GPUs straight away. There’s a special quota setting for VM instances with GPUs, and by default that’s set to zero. As a regular user, you cannot increase this without contacting Google Cloud Support.

Running text-to-speech in the #Cloud is harder than you would think (part two)

Wed, 09 Jul 2025 00:00:00 +0000

Do you remember when I mentioned the difficulty of running 🐸 CoquiTTS in the cloud yesterday? My first experiment was to run it directly in my Cloud Run backend service. In theory, this could have worked, but you'll never guess why it failed in practice.

x86 CPUs. Really. Like the ones we had in our computers in the 90s. How did I figure this out? After taking a horribly long time to start up, the TTS service failed with a message saying that it was running on an 'incompatible' CPU architecture. Specifically, 32-bit x86 CPUs.

Running text-to-speech in the #Cloud is harder than you would think (part one)

Tue, 08 Jul 2025 00:00:00 +0000

For the podcast automation feature that I’m planning for a future version of poketto.me, I’ve been experimenting with various text-to-speech solutions. The easiest and highest-quality approach would have been the ElevenLabs API. However, considering the “throwaway” nature of these audio files – most of which would only be listened to once by one person – and the cost structure that this would introduce, I desperately need a cheaper approach.

The Python library 🐸 CoquiTTS is pretty awesome: There are many different models to choose from, ranging from 'super low latency' to 'high quality' (including voice cloning). Therefore, poketto.me users could choose from many different voices, and from a commercial perspective, I could set different price points for different levels of quality and latency. However, they all require significant computing power to function.

Refactoring “legacy” code? Let the AI handle it!

Mon, 07 Jul 2025 00:00:00 +0000

For reasons outlined in yesterday's post, I had to switch poketto.me from #CloudSQL (MySQL) to a completely different database architecture: Firebase 🔥

At that stage, the Python backend code base wasn’t huge, but it was already fairly substantial. It included CRUD operations for several entities, as well as some basic lookup logic. Rewriting the whole thing would have taken me at least half a day.

Instead, I asked my good friend #Claude to take care of things. And, to my surprise, the result worked straight away! 🎁The “dorp in” replacement generated by the AI immediately passed my unit tests, and also the chatbot’s instructions for how to set up and configure #Firebase were actually useful.

There’s no “npx cap remove” 🤦‍♂️

Sat, 05 Jul 2025 00:00:00 +0000

#Capacitor comes with a user-friendly command line interface. To add a new mobile platform to your project, simply run “npx cap add [android | ios]”. And to remove one? Exactly — you guessed it: 'npx cap remove...' But: That command isn’t implemented – for understandable reasons. The interesting thing is, though, that it's "plausible" that it would be there, right?. So it's not surprising that #Claude insists it exists.

This once again highlights a major issue with LLMs that I just can’t shut up about: Just because what the chatbot says sounds 'plausible' doesn't mean it's correct. In the case of AI-assisted coding, that’s not such a big deal – you, the developer, will eventually realise that the AI was wrong. But what about the many other use cases where we blindly trust the AI and put whatever it says into action? 🤔