LLM tools
2025-07-05


Back in January I finally gave Cursor IDE an honest go and it sold me on the idea of finally using LLMs for programming. Since then I've tried multiple different tools, and even though I still hardly use generated code in my projects, I now see many ways in which this technology can be useful day-to-day.

The most basic tool with the widest adoption is a chat interface. Think: ChatGPT, t3.chat, aistudio. You get to talk to your model of choice directly, ask questions, give out tasks, and so on. I find it especially useful for summarization and search. Summarization is one task that LLMs are actually good at. Proper web search options, such as deep research and search grounding, which provides links to other sources, go a long way too. Searching for something using vague context description is great for exploring. Unlike a traditional search engine, you don't need a very specific query anymore.

Generating self-contained code examples is where I find most value of LLM tools for software development. Small code snippets are easy to verify and save a lot of research time. Instead of reading through docs and library source code, you can generate a personalized example and see if it works for your problem. I usually choose to adapt and transform the code myself, so I still get to do some thinking. The LLM is just there to help me find a working solution faster. Some people choose to go further...

Around February I went through a list of VSCode extensions and VSCode-based editors which enable comfortable LLM usage for development. All these options were really very similar. Open-source: AIDE, Cline, Continue, PearAI, Sourcegraph Cody, Tabby, Void. Proprietary: Codi, Cursor, GitHub Copilot, Qodo, Supermaven, Tabnine, Trae, Windsurf. I was looking for something I could use on my own terms: offline and with open-source models. Even with paid tools, I was quite disappointed. With Cursor everything goes through their servers before hitting any model, so they clearly don't care about self-hosted models. From the open-source selection - Continue seemed interesting. It even got <a href="https://ollama.com/blog/continue-code-assistant">featured on Ollama blog</a>. After trying it out myself it quickly felt lacking in comparison to a full-blown editor, but so did other solutions I tried. Perhaps trying to fit an LLM in the constraints of an existing editor was not such a good idea afterall.

Cursor does many things right and accomodates a wide range of users. It's very good at picking up context and displaying suggested changes. You could think that is exactly what makes it distinctly better than the rest. I certainly thought so at first. There is more: Cursor has amazing autocomplete, well integrated inline editing (Ctrl+K), and a Composer mode for conversational development flow closer to a chat experience. That's like fitting three tools into one, each covering a different prediction distance, each of them scratching a slightly different itch. I believe that is what makes the Cursor experience so good. Out of these three, autocomplete was the first application of LLMs for coding (the other two require instruct model tuning, which came later). It covers the shortest prediction distance, so it's usually rather accurate. It helps with tedious and repeatable edits and typing out boring parts of the code. Then again, so does learning vim. I take pleasure in typing and do it fast enough not to really need completion suggestions for the parts that I already know how to code. When it comes to parts that I'm not yet sure about, I would rather stop and think the problem through, than have an invasive completion suggestion pop up to skew my direction of thinking. Anyway, out of the three editing modes, only this one really needs to exist within the editor.

Inline editing is a pretty cool idea. It can both handle your tedious edits when you don't want to bother coming up with a good macro, and can handle small-scale changes, generating snippets of code in-place. It is also not very invasive - it only works when the user requests it. Finally Composer, with the conversational approach, has the furthest prediction distance. The LLM has to find a proper place to make the edit and then attempt to make a prediction far into the future, to generate a change that is possibly very distant from the present code. It makes this mode prone to mistakes. Generally speaking, the more autonomy you give to the LLM and more complex the instruction, the lower quality and accuracy goes. These two modes don't have to live in the editor. Let's discuss another family of LLM tools - agents.

Meet <a href="https://github.com/Aider-AI/aider">Aider</a> - the first proper LLM agent for the terminal. It is Apache licensed and written in Python. Aider introduces a few chat modes: one lets you quickly ask questions about your codebase and plan out changes, another lets you apply changes to files right in your filesystem, and then there is a third one that kinda combines the previous two. It's kind of like Cursor's Composer, but better. It works with many different providers and any LLM model you could think of. It does conventional commits after each change, so you can thoroughly and comfortably review them. Aider also has a watch mode, which reacts to file changes in a repository and acts on special comments that you can place anywhere. For example you could write a function declaration and right above it a comment "Implement this function. AI!" and Aider should create a matching function definition or fill out the function body. I like the Aider workflow even more than the Cursor experience. The ability to use different models and to seamlessly switch between them, different tools, chat modes - all of this makes Aider stand out as one of the top LLM tools for development. There is a model benchmark section in Aider docs, in case you're wondering which LLMs give best results: <a href="https://aider.chat/docs/leaderboards/">https://aider.chat/docs/leaderboards/</a>

A complementary terminal tool, also very powerful and flexible, is <a href="https://github.com/sigoden/aichat">aichat</a> - written in Rust and available in official Arch Linux extra repos. I haven't used it much yet, but I see huge potential for creating quick terminal commands and personal workflows on top of it. I like two concepts in particular: user-defined roles, a collection of system prompts stored in a directory; and macros, which expand to longer bits of pre-defined text which you can re-use in your prompts. Configuration options are very extensive. There are also some web interfaces that aichat can expose for you, like an openai-compatible api proxy, very useful for testing and development.

After Aider, all major players released their versions of LLM tools for the terminal. First we got Claude Code, some people swear by it, but it is closed source and closely tied to Anthropic, so no way in hell I will ever run it. Then <a href="https://github.com/openai/codex">OpenAI Codex</a> came out, open-source project in TS with an ongoing official Rust rewrite, unfortunately completely unusable in my experience. Finally, recently released <a href="https://github.com/google-gemini/gemini-cli">Gemini-CLI</a>, also open-source and in TS, has a good vibe, but lacks basic features compared to alternatives and is also tightly coupled with G00gle's LLMs. To be fair, none of them are worth wasting time on. If you're looking for something that's more fresh and simple than Aider, I have another tool to recommend.

Lately my go-to LLM terminal tool is <a href="https://github.com/sst/opencode">sst/opencode</a>. This FOSS tool has the cleanest interface I've seen so far. You can't deny it beauty and simplicity. It's also written in TS, but very fast and responsive. I wish it had more advanced features exposed to the user, but defaults are more than good enough, and you can still plug in any model with almost any provider. I often prefer it over opening the web chat interface (I still do that for search). There is a handful of tools built-in and included for the LLM to use: running general shell commands, fetching web content, keeping track of a TODO list (works great for making the LLM implement everything you ask for step-by-step), searching for files by pattern or with grep, basic file operations like listing, reading, writing, editing. That's really all you need for effective agentic work and opencode is great at it.

Having so many LLM agent tools, you may think about running multiple agents in parallel for easy agent/model comparisons and quick re-rolls of changes per prompt. That's exactly where <a href="https://github.com/devflowinc/uzi">uzi</a> comes in. It's a fairly simple project that lets you orchestrate local agents. It can run multiple agents in separate tmux sessions, each working in their own git workspace, on a separate branch. You can control multiple at once by passing stdin directly through tmux, all managed by uzi. Personally I'd take a different approach and use Docker or <a href="https://dev.l1x.be/posts/2020/12/13/diving-into-firecracker-with-alpine/">firecracker VMs</a> for each agent, but then again - that might be more hassle than it's worth.

Regardless of the interface, there's still a serious issue with LLM generated code. The way LLMs write makes single files read really good, but only in separation. Whenever changes are contained within one file, one class, one function - the results are usually more than satisfactory. Once the logic is spread throughout different files, that's when everything falls apart. Again, in separation all new changes look fine, but they lack any design. The LLM does not really think and that lack of thought shows greatly in resulting interfaces, object relations, module separation, and all those other design decisions that human programmers with experience make intuitively on a whim. Generated code looks good on surface, but is very hard to work with later on. LLM agents are also not fully capable of utilizing existing code well for the same reason - they don't have complete enough understanding. Best they can do is explain bits of code, explain their goal out loud, and if those two match, make a decision to use a certain method. LLMs are thus happy to repeat themselves - it's easier for them to process and is therefore cheaper. It's far from how real programmers work. <a href="https://www.pbm.com/~lindahl/real.programmers.html">Real programmers don't use Pascal</a>. Utilizing LLM tools well is a new skill to learn. They can enhance our work, but they need a competent human to drive them properly to really be useful.

There is a lot more to talk about: building tools around LLMs, building tools for LLMs, dense documentation, retrieval-augmented generation, fine-tuning, big models calling specialized models, transpilation, interesting tasks for LLMs... I will most likely come back to these topics later. For now - I'm excited to still have a lot to learn.