Operations on Build in Public

The memory consumption patterns of LangChain are… disturbing

Fri, 26 Sep 2025 00:00:00 +0000

As I said in No, you don’t have to learn LangChain, we shouldn’t get distracted by the artificial complexity introduced by our frameworks. LangChain is mostly a wrapper around the REST APIs of various LLM providers. Useful? Yes—switching between models becomes easy.

But here’s a mystery I can’t explain.

When I added Gemini as a fallback to DeepSeek (see yesterday’s post about DeepSeek refusing to touch Chinese politics), I thought it would be straightforward:

There is no cloud (it’s just somebody else’s computer)

Thu, 18 Sep 2025 00:00:00 +0000

…and ultimately, that “someone” is going to send you a real invoice for actual money. In my case, I’m running poketto.me almost entirely on Google Cloud. While it’s not terribly expensive right now, it is a cost I had to factor into my pricing strategy.

But here’s the good news: Google offers various programs to support early-stage startups! Check out https://cloud.google.com/startup?hl=en for details.

Today, I’m happy to share that poketto.me made it into the “START” tier of that program, which means: no Google Cloud bills in my mailbox for the foreseeable future!

Multi-threaded TTS: A bad idea

Tue, 26 Aug 2025 00:00:00 +0000

Running text-to-speech in the cloud is fun—until it isn’t.

Early on, I didn’t think much about thread safety. During my own testing, rarely would more than one TTS task be running in parallel, so there were no big issues. But once more users started using the feature, strange bugs popped up:

Errors like “Assertion srcIndex < srcSelectDimSize failed” started showing up in the logs—and worse, once triggered, the entire Cloud Run instance would become unusable until a redeploy.

GCS Caching Can Be a Pain in the Neck

Fri, 15 Aug 2025 00:00:00 +0000

I love GCS (Google Cloud Storage). It’s a simple, robust, and powerful solution for storing files online and accessing them either programmatically or via HTTP. Storage is dirt cheap—especially if you don’t need global replication or sophisticated backups. And you can even turn a GCS bucket into an HTTPS-secured, internet-facing web server for static websites. https://poketto.me, for example, runs on that architecture.

Another good use case: the #podcast feature in poketto.me. Naturally, the generated MP3 files need to live somewhere, and storing them in a database or serving them through my Python web server would be… silly. So I push the generated files to a GCS bucket, and all is well: HTTPS-secured, fast, and compatible with any podcast client in the world.

‘Cloud Idenity’ is a secret well kept by Google

Sun, 03 Aug 2025 00:00:00 +0000

Permissions on Google Cloud resources are assigned to principals. Principals, in principle 😏, are Google accounts. My personal @gmail.com address, for example, is the principal that "owns" most of my Google Cloud stuff. So far, so good.

However, in some cases, I was required to delegate ownership of a resource to a principal with an @poketto.me email address — a domain for which I don’t have a Google Workspace account. Consequently, these addresses aren’t recognized as regular Google Accounts. (See exhibit A)

Good things come to those who wait ⏳

Tue, 29 Jul 2025 00:00:00 +0000

Remember when I was complaining about how hard it is to run even basic ML workloads on GCP? Turns out, Google has listened 😊 (well, probably not to me personally, but in general).

You can now request GPUs for Cloud Run instances in the UI as well as on the command line. That means all the hassle I went through deploying my text-to-speech service into a Docker environment running inside a preemptible VM with GPUs—and then figuring out how to start, stop, and deploy the VM automatically—was… well, not exactly wasted, but at least: not necessary anymore.

Cloud Build beats GitLab CI for my use case.

Tue, 15 Jul 2025 00:00:00 +0000

When I set up my personal blog, ralphpmayr.com, years ago, I opted for a GitLab CI pipeline to build and deploy it. However, for poketto.me, I was looking for something faster, cheaper and more closely integrated with the Google Cloud ecosystem.

I opted for Google's own CloudBuild and discovered that it integrates seamlessly with GitLab.com. In GitLab, all you need to do is create two API keys (one for read access and one for edit access), configure these in Google Cloud and CloudBuild will then be able to fetch and build any GitLab project.

Running text-to-speech in the #Cloud is harder than you would think (part three)

Thu, 10 Jul 2025 00:00:00 +0000

So, after finally setting up a dedicated virtual machine (VM) to run my text-to-speech workloads and wiring up all the build and deployment scripts, I got a bit excited. Could I reduce the TTS latency even further if the VM had GPU power?

In theory: Yes. In practice: Google doesn't give you access to their GPUs straight away. There’s a special quota setting for VM instances with GPUs, and by default that’s set to zero. As a regular user, you cannot increase this without contacting Google Cloud Support.

Running text-to-speech in the #Cloud is harder than you would think (part two)

Wed, 09 Jul 2025 00:00:00 +0000

Do you remember when I mentioned the difficulty of running 🐸 CoquiTTS in the cloud yesterday? My first experiment was to run it directly in my Cloud Run backend service. In theory, this could have worked, but you'll never guess why it failed in practice.

x86 CPUs. Really. Like the ones we had in our computers in the 90s. How did I figure this out? After taking a horribly long time to start up, the TTS service failed with a message saying that it was running on an 'incompatible' CPU architecture. Specifically, 32-bit x86 CPUs.

Running text-to-speech in the #Cloud is harder than you would think (part one)

Tue, 08 Jul 2025 00:00:00 +0000

For the podcast automation feature that I’m planning for a future version of poketto.me, I’ve been experimenting with various text-to-speech solutions. The easiest and highest-quality approach would have been the ElevenLabs API. However, considering the “throwaway” nature of these audio files – most of which would only be listened to once by one person – and the cost structure that this would introduce, I desperately need a cheaper approach.

The Python library 🐸 CoquiTTS is pretty awesome: There are many different models to choose from, ranging from 'super low latency' to 'high quality' (including voice cloning). Therefore, poketto.me users could choose from many different voices, and from a commercial perspective, I could set different price points for different levels of quality and latency. However, they all require significant computing power to function.

CloudSQL is prohibitively expensive (at least for small projects)

Sun, 06 Jul 2025 00:00:00 +0000

When I started setting up the cloud infrastructure for poketto.me, I didn’t give much thought to costs. I thought it was such a small project that it just wouldn’t matter. I launched a #CloudSQL (MySQL) database with pretty much the default settings and was quite happy with it – until I checked the billing dashboard a couple of days later and realised that I was already spending almost €4 per day on the database alone. 120 euros per month just for a few MySQL tables? That couldn’t be right.

Multi-threaded webservers in Python: A rabbit hole you don’t want to get into.

Fri, 04 Jul 2025 00:00:00 +0000

Put simply, serving web requests properly in Python is not easy. poketto.me uses a fairly basic off-the-shelf stack (#Flask as a web framework and #SocketIO for websocket communication), and you would never guess the issues you could encounter with it. For starters, Flask comes with a built-in web server (“werkzeug”), which is convenient for development, but absolutely not for production (it even warns you in bright red).

🧵It runs on a single thread – I'm not kidding. This means it can only handle one web request at a time. If that request involves any actual work, such as extracting web content, it cannot handle any other requests at the same time.