#88 The memory consumption patterns of LangChain are… disturbing

As I said in No, you don’t have to learn LangChain, we shouldn’t get distracted by the artificial complexity introduced by our frameworks. LangChain is mostly a wrapper around the REST APIs of various LLM providers. Useful? Yes—switching between models becomes easy. But here’s a mystery I can’t explain. When I added Gemini as a fallback to DeepSeek (see yesterday’s post about DeepSeek refusing to touch Chinese politics), I thought it would be straightforward: ...

September 26, 2025

#80 There is no cloud (it’s just somebody else’s computer)

…and ultimately, that “someone” is going to send you a real invoice for actual money. In my case, I’m running poketto.me almost entirely on Google Cloud. While it’s not terribly expensive right now, it is a cost I had to factor into my pricing strategy. But here’s the good news: Google offers various programs to support early-stage startups! Check out https://cloud.google.com/startup?hl=en for details. Today, I’m happy to share that poketto.me made it into the “START” tier of that program, which means: no Google Cloud bills in my mailbox for the foreseeable future! ...

September 18, 2025

#57 Multi-threaded TTS: A bad idea

Running text-to-speech in the cloud is fun—until it isn’t. Early on, I didn’t think much about thread safety. During my own testing, rarely would more than one TTS task be running in parallel, so there were no big issues. But once more users started using the feature, strange bugs popped up: Errors like “Assertion srcIndex < srcSelectDimSize failed” started showing up in the logs—and worse, once triggered, the entire Cloud Run instance would become unusable until a redeploy. ...

August 26, 2025

#46 GCS Caching Can Be a Pain in the Neck

I love GCS (Google Cloud Storage). It’s a simple, robust, and powerful solution for storing files online and accessing them either programmatically or via HTTP. Storage is dirt cheap—especially if you don’t need global replication or sophisticated backups. And you can even turn a GCS bucket into an HTTPS-secured, internet-facing web server for static websites. https://poketto.me, for example, runs on that architecture. Another good use case: the #podcast feature in poketto.me. Naturally, the generated MP3 files need to live somewhere, and storing them in a database or serving them through my Python web server would be… silly. So I push the generated files to a GCS bucket, and all is well: HTTPS-secured, fast, and compatible with any podcast client in the world. ...

August 15, 2025

#34 ‘Cloud Idenity’ is a secret well kept by Google

Permissions on Google Cloud resources are assigned to principals. Principals, in principle 😏, are Google accounts. My personal @gmail.com address, for example, is the principal that "owns" most of my Google Cloud stuff. So far, so good. However, in some cases, I was required to delegate ownership of a resource to a principal with an @poketto.me email address — a domain for which I don’t have a Google Workspace account. Consequently, these addresses aren’t recognized as regular Google Accounts. (See exhibit A) ...

August 3, 2025

#29 Good things come to those who wait ⏳

Remember when I was complaining about how hard it is to run even basic ML workloads on GCP? Turns out, Google has listened 😊 (well, probably not to me personally, but in general). You can now request GPUs for Cloud Run instances in the UI as well as on the command line. That means all the hassle I went through deploying my text-to-speech service into a Docker environment running inside a preemptible VM with GPUs—and then figuring out how to start, stop, and deploy the VM automatically—was… well, not exactly wasted, but at least: not necessary anymore. ...

July 29, 2025

#15 Cloud Build beats GitLab CI for my use case.

When I set up my personal blog, ralphpmayr.com, years ago, I opted for a GitLab CI pipeline to build and deploy it. However, for poketto.me, I was looking for something faster, cheaper and more closely integrated with the Google Cloud ecosystem. I opted for Google's own CloudBuild and discovered that it integrates seamlessly with GitLab.com. In GitLab, all you need to do is create two API keys (one for read access and one for edit access), configure these in Google Cloud and CloudBuild will then be able to fetch and build any GitLab project. ...

July 15, 2025

#10 Running text-to-speech in the #Cloud is harder than you would think (part three)

So, after finally setting up a dedicated virtual machine (VM) to run my text-to-speech workloads and wiring up all the build and deployment scripts, I got a bit excited. Could I reduce the TTS latency even further if the VM had GPU power? In theory: Yes. In practice: Google doesn't give you access to their GPUs straight away. There’s a special quota setting for VM instances with GPUs, and by default that’s set to zero. As a regular user, you cannot increase this without contacting Google Cloud Support. ...

July 10, 2025

#9 Running text-to-speech in the #Cloud is harder than you would think (part two)

Do you remember when I mentioned the difficulty of running 🐸 CoquiTTS in the cloud yesterday? My first experiment was to run it directly in my Cloud Run backend service. In theory, this could have worked, but you'll never guess why it failed in practice. x86 CPUs. Really. Like the ones we had in our computers in the 90s. How did I figure this out? After taking a horribly long time to start up, the TTS service failed with a message saying that it was running on an 'incompatible' CPU architecture. Specifically, 32-bit x86 CPUs. ...

July 9, 2025

#8 Running text-to-speech in the #Cloud is harder than you would think (part one)

For the podcast automation feature that I’m planning for a future version of poketto.me, I’ve been experimenting with various text-to-speech solutions. The easiest and highest-quality approach would have been the ElevenLabs API. However, considering the “throwaway” nature of these audio files – most of which would only be listened to once by one person – and the cost structure that this would introduce, I desperately need a cheaper approach. The Python library 🐸 CoquiTTS is pretty awesome: There are many different models to choose from, ranging from 'super low latency' to 'high quality' (including voice cloning). Therefore, poketto.me users could choose from many different voices, and from a commercial perspective, I could set different price points for different levels of quality and latency. However, they all require significant computing power to function. ...

July 8, 2025