Remember when I was complaining about how hard it is to run even basic ML workloads on GCP? Turns out, Google has listened 😊 (well, probably not to me personally, but in general).
You can now request GPUs for Cloud Run instances in the UI as well as on the command line. That means all the hassle I went through deploying my text-to-speech service into a Docker environment running inside a preemptible VM with GPUs—and then figuring out how to start, stop, and deploy the VM automatically—was… well, not exactly wasted, but at least: not necessary anymore.
So, what does that mean for poketto.me? The “personal podcast” feature just took a big step toward general availability. 🚀
With this setup, text-to-speech for a 1500-word article with Coqui TTS takes about 4 minutes (roughly half as long as the resulting audio file will play). That’s definitely within the range I’d expect. Plus: the audio quality and available voices are reliably good, and it comes with built-in multilingual support!