The memory consumption patterns of LangChain are… disturbing

As I said in No, you don’t have to learn LangChain, we shouldn’t get distracted by the artificial complexity introduced by our frameworks. LangChain is mostly a wrapper around the REST APIs of various LLM providers. Useful? Yes—switching between models becomes easy.

But here’s a mystery I can’t explain.

When I added Gemini as a fallback to DeepSeek (see yesterday’s post about DeepSeek refusing to touch Chinese politics), I thought it would be straightforward:

By default, I’m instantiating my chat model thus:

llm = init_chat_model(model=\'deepseek-chat\', model_provider=\'deepseek\')

And, if a subsequent call would error out with DeepSeeks HTTP/400 “Content Exists Risk”, I’d go with Gemini instead:

llm = init_chat_model(model=\'gemini-2.5-flash\', model_provider=\'google_vertexai\')

So far, so good—until I deployed it to CloudRun. There, Every time the fallback kicked in, the instance crashed:

Memory limit of 512 MiB exceeded with 523 MiB used. Consider increasing the memory limit…

I dug deeper on my local machine: calling init_chat_model with google_vertexai instantly doubled the memory consumption of my Python process!

I didn’t want to debug this forever, so I swapped Gemini out for Claude:

llm = init_chat_model(\"claude-3-5-sonnet-latest\", model_provider=\"anthropic\")

Result: No memory issues at all.

So… riddle me this: why in the world would LangChain need 200+ MB of memory just to launch a Vertex AI chat model that ultimately just sends a REST call to Google?