LLM agents
2025-09-21


Getting a little deeper into the good, the bad, and the things I miss after playing around with LLM tools in the terminal for some time now. In my previous post (<a href="https://www.wykwit.pl/blog/#post-38">link!</a>) I gave an overview of some tools built around LLMs specifically for programming and which ones I ended up using the most myself. This time I briefly explain how such things are built and then lay down some ideas on what could be improved to make LLM-powered tools aid my processes even better.

By a "tool" we could mean many different things, like:
 - LLM agents (tools around LLMs)
 - programs that the LLM agent calls (tools for LLMs)
 - other programs that we use
 - plugins for other programs

Let's walk through the anatomy of an LLM agent first. It will help us see where each tool type fits in the whole thing. In general agents are built around an inifinite loop with three steps: input (prompt), output (inference), actions (tool calls). We will refer to the code that manages this loop as the LLM harness. It works kinda like a programming language interpreter. First the model is informed about capabilities of the harness - for example it can get a plain text explanation of provided tools along with expected input/output formats for a proper call. The model might be fine-tuned for a more complex format. Sometimes it can be overfit for a particular set of tools, or harness capabilities, which causes interoperability issues. When harness picks up a tool call pattern in the LLM output it makes the call with data passed by the model and in the next loop interation feeds that to the LLM as input. When there are no more tool calls to process, the harness goes back to asking the user for another prompt.

Agent takes actions by executing tool calls, which can be implemented as:
 - function calls - using code straight in the agent harness, similar to how naive interpreters work
 - tool calls - this is a general term, but it can also mean invoking external commands on the local system - such as bash, grep, find, head, linters and so on
 - MCP client calls - using a protocol or framework for making a call to an external server, which shifts responsibility of implementing the former two to the MCP server developer

Model Context Protocol (MCP) functions as a middle man between the place where tool calls are invoked (client, the LLM agent) and where these calls are executed (the MCP server, integrated with some resource). Personally I don't find a need for it - local agents already have shell access with all the tools they could ask for. I'd rather enhance the capabilities of my shell with additional CLI tools. Sure there are some things that could benefit from restructuring I/O for LLM consumption and that's where I'd consider plugging MCP if I wanted to connect LLM to the thing. I think it's a good idea as a conversational interface for your services. I would also consider implementing MCP as a client if I was making my own LLM agent from scratch.

To finish off my quick agent anatomy course we have to discuss context management. LLMs have a limited context window they can process - information that doesn't fit in the context is not considered during inference. For the most popular models that window is still rather small if you consider how big source files can get in long running projects. This is the hard part. If you don't manage the context well the agent will quickly forget what it was doing. Good prompting beats fine-tuning for general tasks - and this is also part of context management. First the LLM harness needs to load up tool definitions; then specific user-defined rules from AGENTS.md. Often the LLM agent tool also comes with baked-in system prompts or prompt templates that tell the model how to behave for the most effective outcome. You will easily find entire prompt template collections online (for example <a href="https://github.com/danielmiessler/fabric">danielmiessler/fabric</a>), and AGENT.md or cursorrules repositories. "Teaching the LLM" is all about effective context management strategies.

Here's one example - it's very effective to include relevant resources in the LLM input - this strategy is called retrieval-augmented generation (RAG):
 - the user can specify resources or include them directly in their prompt
 - the harness can go fetch relevant resources based on user's prompt and include them as additional input
 - the model can request related resources through a dedicated tool call

RAG is one of the things I like the most about this LLM rush. Embeddings and vector databases can be used for sweet semantic search with applications beyond silly chat bots. RAG also allows the model to cite sources with higher reliability. I like how the need for compressing technical documentation for LLM consumption resulted in some good dense text documentation which is very accessible and helpful for human learning. We got new scrapers, parsers, better OCR, search, and so on - mainly to make documents easier to digest for LLMs, but accidentally also making many resources easier to process by hackers like myself.

My point is that some tools for LLMs are really cool for people too. Many of them would fit very well in a regular development flow. Nothing prevents you from calling LLMs as a CLI tool or giving your LLM agent the ability to spawn sub-agents. C0gniti0n the Devin company suggests it might not be the best idea (<a href="https://cognition.ai/blog/dont-build-multi-agents">https://cognition.ai/blog/dont-build-multi-agents</a>), but who the hell would listen to those guys anyway, right? On the other hand Anthr0pic uses subagents in Claude Code (<a href="https://docs.claude.com/en/docs/claude-code/sub-agents">https://docs.claude.com/en/docs/claude-code/sub-agents</a>) and describes LLM workflows that involve multiple workers on their blog, which seems to be a pretty good resource: <a href="https://www.anthropic.com/engineering/building-effective-agents">https://www.anthropic.com/engineering/building-effective-agents</a>

You might be eager to write your own LLM agent now. You will unevitably find many libraries that claim to be the best for this task, but let me just remind you - <a href="https://github.com/sigoden/aichat">aichat</a> proves you don't need any external libraries for handling inference from an API. If you are to pick one API to integrate - make it OpenAI-compatible API and use LiteLLM (<a href="https://www.litellm.ai/">https://www.litellm.ai/</a>) as a proxy for your target model. You will get logs, stats and cost tracking for free. You can even plug GitHub Copilot as a provider. Of course if you prefer to use LangChain or whatever else - it's up to you. In the end you might still find yourself using more purposeful tools, such as aichat or Simon Willison's <a href="https://github.com/simonw/llm">"llm" LLM tool collection</a> as building blocks or to enhance your own flow instead of replacing it with agents.

People have varied opinions when it comes to LLM agents, how they should be run, managed, etc. This blog post is a good overview and will probably give you some ideas: <a href="https://shmck.substack.com/p/claude-code-framework-wars">https://shmck.substack.com/p/claude-code-framework-wars</a>

Finally we get back to where we started this post. I think the missing piece for LLM agents right now is easy knowledge retrieval and self-hosted grounding. This is mainly because managing context is difficult and mostly left for the user. If I were to address this issue, I'd use a specialized agent to help select the scope for the coding agent and prepare a well-formated prompt. I would also use RAG to include additional context about target of the change. Instead of making the agent grep for function usages and references, it could be included in the context. Closer integration with treesitter would also be an interesting change. The LLM wouldn't have to target files and walk the filesystem anymore - it could target specific classes and functions directly, and have more information retrieved from the vector database after one initial sweep of the codebase. You could also chat with the codebase through RAG after that one initial sweep. This might not be the best approach, but I'm curious to see how it would work in practice.

Anyway, to recap - an LLM agent is basically a dumb LLM calling a bunch of dumb tools in a loop. Surely that's worth millions of dollars if not more?