Improve LLM Debugging



This content originally appeared on DEV Community and was authored by Stanislav Silin

Recently, I have been working extensively with LLM-powered coding assistants – Claude Code, Cursor, Windsurf – and I noticed a frustrating pattern. Every time these tools run a command for me, they’re processing massive amounts of irrelevant log output. You know how it goes – the LLM runs npm run build for your React app and receives screens full of webpack configs, browserslist warnings, and compilation progress when all it really needs are those few error lines to help you debug.

But what if we could change that? What if there was a way to make every command output exactly what the LLM needs to see?

The Token Problem

Let’s have a look at what I mean. Here’s what happens when your LLM assistant runs a typical React build:

> npm run build:graphql && react-router typegen && tsc && react-router build


> build:graphql
> graphql-codegen

✔ Parse Configuration
✔ Generate outputs
app/features/tasks/services/atoms.ts:55:60 - error TS2339: Property 'taskId' does not exist on type '{ request: UpdateTaskRequest; }'.

55         const response = await apiClient.updateTask(params.taskId, params.request);
                                                              ~~~~~~


Found 1 error in app/features/tasks/services/atoms.ts:55

That’s about 10+ lines of output when the only valuable information for debugging is 1 error line. Does this look familiar? And here’s the thing – this problem exists for basically every technology stack:

  • Python/Django: Pages of pip install progress bars and dependency resolution before showing the actual ImportError
  • .NET: MSBuild verbosity drowning out the actual CS compiler errors
  • Java/Spring: Maven downloading half the internet before showing your NullPointerException
  • Docker: Layer caching information obscuring the actual build failure
  • Rust: Cargo’s detailed compilation progress hiding the borrow checker errors

When your LLM assistant is helping you debug multiple build issues, running tests, and checking git status, it’s processing thousands of unnecessary tokens. Whether you’re using Claude Code, Cursor, or any other LLM-based coding tool, those tokens add up quickly – both in terms of API costs and response time.

The Half-Solution That Doesn’t Work

Now, you might think: “Why not just ask the LLM to use grep or findstr to filter the output?” As for me, I tried that too. You can tell your LLM something like:

“Run npm run build but pipe it through grep to only show errors”

And sometimes you’ll get:

npm run build 2>&1 | grep -E "error|Error|ERROR"

But here’s the thing – it’s an LLM. Sometimes it forgets to add the filter. Sometimes it uses the wrong regex pattern. Sometimes it tries to be helpful and shows you the “full context” anyway. And when you’re deep in a debugging session at 2 AM, the last thing you want is to repeatedly remind an LLM to filter output or correct its regex patterns.

Plus, different tools need different patterns. React errors look different from Python tracebacks, which look different from Rust compiler errors. You’d need to remember (or teach the LLM) dozens of different grep patterns for different scenarios.

Building a Real Solution

That’s when I thought: what if I build a simple tool that does exactly what we need? No more hoping the LLM remembers to filter. No more copying the wrong grep pattern. Just a tool that knows how to extract errors from any build output.

After dealing with this problem for weeks, I decided to build that tool once and for all. The idea is simple: create a CLI that can execute commands but only show the output that matters.

Meet Apparatus.Exec – a command executor designed specifically for LLM-driven development workflows.

Get Started

Let’s jump right in and see how this works.

First, install the tool globally:

npm install -g @apparatus-technology/exec

Now you have access to the aex command. Let’s initialize it in your project:

aex init

This creates an exec.config.yml file with predefined shortcuts and filters. Now let’s see the magic happen when you run your commands through aex:

$ aex frontend-build
app/features/tasks/services/atoms.ts(55,60): error TS2339: Property 'taskId' does not exist on type '{ request: UpdateTaskRequest; }'.

Done

Done! Just the error. No fluff, no timestamps, no progress bars. When the LLM reads this, it’s processing 90% fewer tokens while getting 100% of the useful information.

How to use

The tool operates on two simple concepts: shortcuts and filters. Let’s have a look at each one.

Shortcuts

Instead of typing long command sequences repeatedly, you define them once in your config:

shortcuts:
  frontend-build:
    commands:
      - npm ci
      - npm run build
    description: Clean install and build React application
    working-dir: ./frontend

  test-all:
    commands:
      - npm test -- --coverage
      - npm run e2e
    description: Run unit and e2e tests

Now aex frontend-build executes both commands in sequence. But the real trick is in the filters.

Filters

The real power comes from regex filters that extract only relevant output:

filters:
  - name: React Build Errors
    pattern: src/.*\.(tsx?|jsx?):\d+:\d+:.*
    description: ESLint and TypeScript errors with file paths and line numbers

  - name: Test Failures
    pattern: (?i)failed|error|exception|FAIL
    description: Test failures and exceptions

  - name: Module Errors
    pattern: Cannot find module|Module not found
    description: Missing dependencies and import errors

When you run a shortcut, the output passes through these filters. Only lines matching your patterns make it through. Everything else gets discarded.

Real-World Example

Now let’s see this in action. I was recently using Claude Code to debug a React component where tests were failing due to a missing import. Here’s how the interaction looked:

$ aex test-frontend
FAIL src/components/UserProfile.test.tsx
  ● Test suite failed to run
    Cannot find module '@/hooks/useAuth' from 'src/components/UserProfile.tsx'
Done

$ aex git-status
M  src/components/UserProfile.tsx
M  src/hooks/useAuth.ts
?? src/hooks/useAuth.test.ts
Done

Claude Code immediately saw: “The tests are failing because the useAuth hook can’t be found. You’ve modified UserProfile.tsx and created useAuth.ts but it’s not being resolved.” The LLM instantly understood the issue and suggested checking my TypeScript path mappings in tsconfig.json.

Without filtering, the LLM would have received hundreds of lines of test runner initialization, webpack bundling messages, and git’s verbose status output. The conversation would have been slower, more expensive, and harder to follow.

Advanced Features

As I used the tool more, I kept adding features that made my workflow smoother.

Working Directories

Each shortcut can specify where it should run:

shortcuts:
  frontend-build:
    commands:
      - npm ci
      - npm run build
    working-dir: ./client

  backend-build:
    commands:
      - pip install -r requirements.txt
      - python manage.py runserver
    working-dir: ./server

  mobile-build:
    commands:
      - flutter pub get
      - flutter build apk
    working-dir: ./mobile

No more cding around or forgetting which directory you’re in. This is especially useful in monorepos where you have multiple technologies in different folders.

Pattern Templates

When you run aex init, you can choose from predefined pattern collections:

  • .NET Build Errors
  • TypeScript Compiler Errors
  • Docker Build Failures
  • Test Framework Errors
  • Git Conflicts

Each template includes regex patterns that I’ve been using in my workflow. They work well for me, but your mileage may vary depending on your specific tools and setup. The good news is that patterns are just simple regex strings in the config file – you can easily tweak them anytime to match your exact error formats.

Debug Mode

At this point, you might ask: “What if the filters miss something important?” This is where --no-filter becomes essential. When the filters aren’t catching what you need, you can see the complete output to understand what patterns to add:

aex frontend-build --no-filter  # See complete output

I use this constantly when setting up filters for a new project. Run the command with --no-filter, see what the actual error format looks like, then adjust your regex patterns accordingly. Every project is a bit different – different linting rules, different error formats, different test runners. The predefined patterns are just a starting point; you’ll likely need to tweak them for your specific workflow.

Performance

Think about your typical debugging workflow. You run a command, fix an error, run again, fix another error, and so on. With traditional output:

  • Failed build: 800 tokens of mostly noise
  • Successful build: 500 tokens of “everything compiled successfully” messages
  • Another failed attempt: 800 more tokens
  • Final success: Another 500 tokens of success messages

With Apparatus.Exec:

  • Failed build: 60 tokens of actual errors
  • Successful build: 1 token (just “Done”)
  • Another failed attempt: 40 tokens of the remaining error
  • Final success: 1 token

Over a debugging session with 20-30 commands, that’s a reduction from ~15,000 tokens to ~500 tokens. But here’s what really matters – the LLM only sees information when there’s something to fix. No more processing “Build succeeded” messages, no more reading through successful test output. The silence itself becomes information: if there’s no output, everything worked.

Limitations

This tool doesn’t do well:

  1. Complex shell features – It doesn’t support pipes, redirects, or shell variables. Each command runs independently.
  2. Interactive commands – Commands that require user input won’t work properly.
  3. Real-time output – You only see output after the command completes, not as it runs.

For these cases, you’ll still need to use your regular terminal. But for the bread-and-butter commands you run while coding with an LLM, it works perfectly.

The Future

I’m actively using this tool in my daily work and have plans to improve it based on feedback. What features would help your workflow? Feel free to create an issue or submit a PR – I’d love to hear your ideas!

Conclusion

Working with LLMs has changed how we write code, but the tooling hasn’t caught up yet. We’re using tools designed for human consumption in LLM workflows, and it shows. This tool is my attempt to bridge that gap – at least for command execution and output filtering.

As for me, the difference has been dramatic. My LLM conversations are faster, cheaper, and more focused. The tool pays for itself in saved tokens within a single debugging session.

If you’re using LLM-powered coding assistants like Claude Code, Cursor, Windsurf, or Gemini CLI, you know how much unnecessary output gets sent to the LLM when it runs commands for you. Every time these tools execute a build or test command, they’re consuming tokens on verbose logs that don’t matter. This tool helps reduce that waste significantly.

Thanks for your time! Feel free to leave a comment or create an issue if you have questions or suggestions.

Links: GitHub, NPM


This content originally appeared on DEV Community and was authored by Stanislav Silin