So, after finally setting up a dedicated virtual machine (VM) to run my text-to-speech workloads and wiring up all the build and deployment scripts, I got a bit excited. Could I reduce the TTS latency even further if the VM had GPU power?
In theory: Yes. In practice: Google doesn't give you access to their GPUs straight away. There’s a special quota setting for VM instances with GPUs, and by default that’s set to zero. As a regular user, you cannot increase this without contacting Google Cloud Support.
It's not exactly 'self-service', but it's good to know before you start training large models.