One would think that poketto.me is so straightforward that any kind of architectural documentation would be unnecessary. But then...
Whenever a user saves something, the app first fetches the raw content from the relevant webpage. Then, the content undergoes an asynchronous 'cleanup' process (as I mentioned previously, I use an LLM to fix broken formatting, etc.). If the user has turned on that setting, the content is then translated. This already results in five different states that a Save can be in: New (no content available yet), Extracted (raw content available), Processing (raw content cleaned up), Translating (content being translated), and 'None' (everything done). Of course, the frontend needs to be mindful of what to allow the user to do in each of these states.
And then, the Podcast feature adds quite a bit more complexity. At any point, users can add the save to one of their podcast feeds – and that makes sense, given that the user wouldn’t want to wait for poketto.me to have finished fetching, cleaning, translating the content before they can move it to their feed. Hence, what exactly this means depends on the state of the save: If it's 'New', 'Extracted', 'Processing' or 'Translating', it's not worth using TTS on the unfinished text. In this case, the save is added to a dedicated queue for later TTS processing. Therefore, once the translation (or processing) step is complete and TTS has been requested, the save enters the 'TTS'ing' state.
This is a bit more complicated than it looks at first glance.
Enter: State diagrams! Read the above text again and then look at the attached state diagram. Which is easier to understand? Clearly, the diagram. It's definitely the asset I'll refer to if I need to debug something.
P.S. It’s an interesting question whether this level of complexity merits upgrading the tech stack. Currently, it's hand-crafted, but: There are things out there like Google Cloud Dataflow, Dagster, Airflow, dbt and Confluent, each of which would have its merits over my homegrown approach.