Tackle the Monkey First

I don’t remember where I first came across this metaphor, but I still love it:

Suppose your task is to train a costumed monkey to recite a Shakespeare sonnet while standing on an elaborately carved wooden pedestal. Where do you start? Do you begin by picking out the wood for the pedestal? Designing the decorative motifs? Choosing which hat the monkey should wear? Or—more obviously—do you first figure out whether you can actually get a monkey to recite poetry in the first place?

In this parable, the key to success lies in solving the hardest, most uncertain part of the challenge first: training the monkey. And yet, in product development, we often fall into the exact opposite pattern. We busy ourselves with the easy, low-risk, but time-consuming tasks—while putting off the hard questions that ultimately determine success.

In tech, this might look like teams diving into infrastructure early on: setting up build pipelines, cloud deployments, internal tooling, or debating frameworks. While important eventually, none of that matters if your core hypothesis turns out to be wrong. The real questions early on often are:

🧐Can the AI model actually do what we need it to?
🧐Can we create a UX that feels intuitive and delightful?
🧐How much will it cost to run this thing?
🧐Do users even want what we’re building?

Product managers (mea culpa 🙋‍♂️) fall into this trap too. We’ll spend days on intricate pricing models, GTM plans, competitive analyses or customer service workflows for an app that doesn’t yet have a single user. That might feel like progress—but if we haven’t validated desirability, feasibility, or viability, it’s just premature optimization.

The podcast functionality in poketto.me is a good illustration of this mindset. There was a lot of backend plumbing I knew I’d eventually need:

➡️Cleaning and adapting raw article text for speech
➡️Generating and serving MP3s
➡️Creating and updating podcast feed XMLs
➡️Deploying a Cloud Run instance for the TTS model

All of that is complex—but it’s predictably complex. I was confident I could build it when the time came. The real question—the metaphorical monkey—was this:

🧐Can an open-source TTS model generate speech that’s good enough for casual listening? And: Can I run such a model in the cloud cost-effectively, in a way that a sustainable pricing model could support?

That’s what I tackled first. I ran quick tests, iterated on voice quality, evaluated infrastructure cost, and only once I had confidence in the answers did I “allow” myself to start building out the full system around it.

No, this doesn't guarantee success. But by tackling the monkey first, I avoided the risk of pouring weeks into a polished, production-grade feature that might have never cleared its most critical technical and economic hurdles.

It’s a lesson worth repeating—for solo builders, startup teams, and corporate product squads alike:

Don’t start with the pedestal. Start with the monkey