🧠 Kaizen Agent Architecture β€” How Our AI Agent Improves Other Agents



This content originally appeared on DEV Community and was authored by Suzuki Yuto

At Kaizen Agent, we’re building something meta: an AI agent that automatically tests and improves other AI agents.

Today I want to share the architecture behind Kaizen Agent, and open it up for feedback from the community. If you’re building LLM apps, agents, or dev toolsβ€”your input would mean a lot.

🧰 Why We Built Kaizen Agent

One of the biggest challenges in developing AI agents and LLM applications is non-determinism.

Even when an agent β€œworks,” it might:

  • Fail silently with different inputs
  • Succeed one run but fail the next
  • Produce inconsistent behavior depending on state, memory, or context

This makes testing, debugging, and improving agents very time-consuming β€” especially when you need to test changes again and again.

So we built Kaizen Agent to automate this loop: generate tests, run them, analyze the results, fix problems, and repeat β€” until your agent improves.

🖼 Architecture Diagram

Here’s the system diagram that ties it all together β€” showing how config, agent logic, and the improvement loop interact:

Kaizen Agent Architecture

📊 Note: Due to dev.to’s image compression, click here to view the full resolution diagram for better clarity.

⚙ Core Workflow: The Kaizen Agent Loop

Here are the five core steps our system runs, automatically:

[1] 🧪 Auto-Generate Test Data

Kaizen Agent creates a broad range of test cases based on your config β€” including edge cases, failure triggers, and boundary conditions.

[2] 🚀 Run All Test Cases

It executes every test on your current agent implementation and collects detailed outcomes.

[3] 📊 Analyze Test Results

We use an LLM-based evaluator to interpret outputs against your YAML-defined success criteria.

  • It identifies why specific tests failed.
  • The failed test analysis is stored in long-term memory, helping the system learn from past failures and avoid repeating the same mistakes.

[4] 🛠 Fix Code and Prompts

Kaizen Agent suggests and applies improvements not just to prompts, but also modifies your code:

  • It may add guardrails or new LLM calls.
  • It aims to eventually test different agent architectures and automatically compare them to select the best-performing one.

[5] 📤 Make a Pull Request

Once improvements are confirmed (no regressions, better metrics), the system generates a PR with all proposed changes.

This loop continues until your agent is reliably performing as intended.

🙏 What We’d Love Feedback On

We’re still early and experimenting. Your input would help shape this.

👇 We’d love to hear:

  • What kind of AI agents would you want to test with Kaizen Agent?
  • What extra features would make this more useful for you?
  • Are there specific debugging pain points we could solve better?

If you’ve got thoughts, ideas, or feature requests β€” drop a comment, open an issue, or DM me.

💡 Big Picture

We believe that as AI agents become more complex, testing and iteration tools will become essential.

Kaizen Agent is our attempt to automate the test–analyze–improve loop.

🔗 Links


This content originally appeared on DEV Community and was authored by Suzuki Yuto