Put simply, serving web requests properly in Python is not easy. poketto.me uses a fairly basic off-the-shelf stack (#Flask as a web framework and #SocketIO for websocket communication), and you would never guess the issues you could encounter with it. For starters, Flask comes with a built-in web server (“werkzeug”), which is convenient for development, but absolutely not for production (it even warns you in bright red).

🧵It runs on a single thread – I'm not kidding. This means it can only handle one web request at a time. If that request involves any actual work, such as extracting web content, it cannot handle any other requests at the same time.

🦄I realised this fairly late on, but was confident that I could easily fix it. My AI assistants agreed that gunicorn ( https://gunicorn.org/) was the best choice for production, but that’s where the trouble started. Mind you, adding the dependency and changing my Dockerfile for the backend service would have been easy enough:

CMD ["gunicorn", "-w", "1", "--threads", "16", "--timeout=30", "--bind=0.0.0.0:8080", "main:app"].

However, the AI insisted that I use either eventlet or gevent as the 'worker mode', and, in an almost farcical manner, suggested that I use both. Needless to say, don't try this at home! It won't work:

socketio = SocketIO(app, cors_allowed_origins="*", async_mode="**gevent**")
gunicorn --worker-class **eventlet** -w 1 --bind 0.0.0.0:8080 main:app

After some trial and error — which culminated in Claud recommending that I add an event queue, Redis, and a lot of other things to my app — I figured it out on my own. Just use threading. Simple as that.

socketio = SocketIO(app, cors_allowed_origins="*", async_mode="threading")
gunicorn -w 1 --threads 8 --bind 0.0.0.0:8080 main:app

Of course, this approach also has its limitations: You can only have one worker process (otherwise you’d lose the affinity needed for SocketIO to communicate with clients via the web socket), and eight threads for now. Nevertheless, this is an improvement on one thread and is most likely sufficient for the scale I’m envisaging for poketto.me. And if not, this at least to allows for the good old KIWI principle to kick in: “kill it with iron,” in the sense that you can scale up the hardware (such as switching to larger Cloud Rund instances with more CPUs) to match demand.

Nevertheless, one final point to note is that you can’t easily debug this setup. In other words, you can’t simply run gunicorn as the entry point for your app in your launch.json file in Visual Studio Code and expect breakpoints to work, for example. For local debugging, I’m still using the tried-and-tested 'werkzeug'. This is risky, of course: what if something breaks (due to a subtle change in websocket communication, for example) that only occurs with gunicorn and that I can’t reproduce with werkzeug? It doesn't exactly inspire confidence and reminds me of a category of issues that the Enterprise Java community faced – and solved – about 20 years ago 😅