This content originally appeared on DEV Community and was authored by Pavel
Large Language Models are powerful, but they’re also resource-intensive. Every query, no matter how simple, consumes expensive computational cycles. I realized a huge chunk of my server costs was from the LLM repeatedly answering “hello,” “thanks,” and “ok.”
These queries are a waste. They don’t teach the model anything new. They don’t require complex reasoning. They are pure resource drain.
My first thought was to filter them out on the client-side. But that creates a manual chore—I’d have to constantly update the list of simple phrases. That approach doesn’t scale.
So, I flipped the problem on its head. What if the LLM could solve this problem itself?
The core idea is this: I created a system where the LLM itself decides which queries are too simple for its attention and teaches a client-side helper to handle them in the future.
The Architecture: Self-Delegation
I built a project around this single concept, stripping away everything non-essential from a previous version. It has only two key parts:
- The Server-Side “Teacher” (The LLM): The main, powerful model. Its job is to handle complex tasks and—crucially—to identify low-value, repetitive queries.
- The Client-Side “Gatekeeper” (The Helper): A tiny, zero-dependency JavaScript agent in the browser. It intercepts all user input and asks the LLM for help only when it encounters something it hasn’t been taught to handle.
How the LLM Offloads Its Work
The first time a user sends a simple query like “thx”, the Gatekeeper doesn’t recognize it and forwards it to the LLM.
The LLM knows “thx” is simple. Instead of just sending back a text answer, it sends back a special JSON payload containing a direct order:
{
"userResponse": "You're welcome!",
"learningInstruction": {
"command": "LEARN_SIMPLE_PHRASE",
"query": "thx",
"response": "You're welcome!"
}
}
This learningInstruction
is the key. It’s the LLM telling its Gatekeeper:
“I’ve answered this for you once. Now learn it. From now on, you handle this query yourself. Do not send it to me again.”
The Gatekeeper receives this command, saves the new rule to the browser’s localStorage
, and delivers the response to the user. The user sees nothing but a fast response.
But in the background, the system just became smarter and more efficient. The next time “thx” is sent, the Gatekeeper handles it instantly, and the server is never bothered.
The LLM is actively, automatically, and invisibly making its own job easier. It’s not being trained by a human; it’s training its own assistant to filter out the noise.
This project was an exercise in minimalism. I threw out a complex neural network library and other unnecessary features to focus solely on perfecting this self-delegation loop. The result is a lean, powerful system that demonstrates a smarter way to build AI applications.
We don’t just need bigger models; we need smarter architectures where models can work together and optimize their own workflows.
Check out the full implementation on GitHub and see the self-delegation in action.
https://github.com/Xzdes/slmnet-Hybrid
This content originally appeared on DEV Community and was authored by Pavel