This content originally appeared on DEV Community and was authored by Atsushi Suzuki
With the release of Docker Desktop 4.43 (July 3, 2025), you can now declare AI models, agents, and MCP tools in a single compose.yaml
file and launch them all with a single docker compose up
.
On top of that, the beta release of Docker Offload allows you to run Compose projects directly in the cloud with NVIDIA L4 GPUs. This opens the door to running large-scale models from even a modest laptop.
In this post, I’ll walk through how to use docker/compose-for-agents‘ official A2A Multi‑Agent Fact Checker sample entirely with Compose. I’ll also demonstrate how to offload the workload to the cloud using Docker Offload.
Some images in this post are sourced from the official Docker Offload Content Kit provided to Docker Captains.
Sample Overview
The A2A Multi‑Agent Fact Checker is a multi-agent system built with Google’s ADK (Agent Development Kit) and the A2A protocol. It features three agents—Auditor, Critic, and Reviser—that work together to research, verify, and revise a given claim, then return a final conclusion.
- Auditor: Breaks down the user’s claim into subtasks and delegates them to Critic and Reviser. Collects the final answer and returns it via the UI.
- Critic: Performs external web searches using the DuckDuckGo MCP tool to gather supporting evidence.
- Reviser: Refines and verifies the output using the evidence gathered by Critic and the initial draft from Auditor.
The Critic communicates with the outside world via the MCP Gateway, and the inference model (Gemma 3 4B‑Q4) is hosted via Docker Model Runner.
Key Highlights of compose.yaml
Here’s the full compose.yaml
used to define the multi-agent system:
services:
# Auditor Agent coordinates the entire fact-checking workflow
auditor-agent-a2a:
build:
target: auditor-agent
ports:
- "8080:8080"
environment:
- CRITIC_AGENT_URL=http://critic-agent-a2a:8001
- REVISER_AGENT_URL=http://reviser-agent-a2a:8001
depends_on:
- critic-agent-a2a
- reviser-agent-a2a
models:
agents:
endpoint_var: MODEL_RUNNER_URL
model_var: MODEL_RUNNER_MODEL
critic-agent-a2a:
build:
target: critic-agent
environment:
- MCPGATEWAY_ENDPOINT=http://mcp-gateway:8811/sse
depends_on:
- mcp-gateway
models:
gemma3:
# specify which environment variables to inject into the container
endpoint_var: MODEL_RUNNER_URL
model_var: MODEL_RUNNER_MODEL
reviser-agent-a2a:
build:
target: reviser-agent
environment:
- MCPGATEWAY_ENDPOINT=http://mcp-gateway:8811/sse
depends_on:
- mcp-gateway
models:
gemma3:
endpoint_var: MODEL_RUNNER_URL
model_var: MODEL_RUNNER_MODEL
mcp-gateway:
# mcp-gateway secures your MCP servers
image: docker/mcp-gateway:latest
use_api_socket: true
command:
- --transport=sse
- --servers=duckduckgo
# add an MCP interceptor to log the responses
- --interceptor
- after:exec:echo RESPONSE=$(cat) >&2
models:
# declare LLM models to pull and use
gemma3:
model: ai/gemma3:4B-Q4_0
context_size: 10000 # 3.5 GB VRAM
#context_size: 131000 # 7.6 GB VRAM
Top-Level models
As of Compose v2.38, you can declare LLM images as OCI Artifacts under the top-level models
field. Docker Model Runner will automatically pull the image and expose it as an API endpoint.
Per-Service models
Each service defines which model to use and how to inject its URL and model name via environment variables:
export MODEL_RUNNER_URL=http://model-runner:12434
export MODEL_RUNNER_MODEL=gemma3
This means your app can simply read environment variables without hardcoding the model path.
MCP Gateway
The docker/mcp-gateway
image acts as a secure relay for MCP servers like DuckDuckGo. It communicates with the Critic agent using Server-Sent Events (SSE). The --interceptor
flag logs the raw responses directly to stderr.
Dependency Management
Like traditional Compose setups, depends_on
is used to manage startup order: MCP Gateway → Critic/Reviser → Auditor. This eliminates the need for retry logic.
Running Locally
You can launch the stack locally with:
docker compose up --build
[+] Running 8/9
✔ reviser-agent-a2a Built 0.0s
✔ critic-agent-a2a Built 0.0s
✔ auditor-agent-a2a Built 0.0s
⠴ gemma3 Configuring 76.5s
...
Since Gemma 3 4B‑Q4 is quantized, it even runs on my MacBook Air M2.
Open your browser to http://localhost:8080
, type in a claim like:
How far is the Moon from the Earth?
The Critic performs a DuckDuckGo search, the Reviser polishes the output, and the Auditor returns the final answer.
Using Docker Offload
If you want to use a larger model like Gemma 27B Q4, local GPUs might not cut it. That’s where Docker Offload comes in—just enable the feature and override your model config to run in the cloud.
Enabling Docker Offload
First, sign up for beta access on Docker’s official site. (As a Docker Captain, I received early access.)
Then, go to Settings > Beta Features and enable both:
- “Enable Docker Offload”
- “Enable Docker Offload GPU Support”
Switch the Docker Desktop toggle to the cloud icon to activate Offload (or run docker offload start
).
compose.offload.yaml
Prepare a separate file to override the model definition:
models:
gemma3:
model: ai/gemma3-qat:27B-Q4_K_M
context_size: 10000 # 18.6 GB VRAM
# context_size: 80000 # 28.37 GB VRAM
# context_size: 131000 # 35.5 GB VRAM
Running with Offload
To launch in the cloud, combine the two Compose files:
docker compose -f compose.yaml -f compose.offload.yaml up --build
This overrides the top-level models
field with the Offload-specific config.
Docker Offload gives you 300 GPU credits for free, and any additional usage is billed at \$0.015 per GPU second. Don’t forget to stop the service afterward:
docker offload stop
Final Thoughts
Trying out Compose and Offload together really shows the power of unified agent, model, and tool orchestration. It’s incredibly convenient to use the same docker compose up
command for both local and cloud environments.
The agent space is evolving rapidly, so if you have a better workflow or tips, I’d love to hear about them.
If you’re curious about Docker Offload, start experimenting within the free credit limit—you might be surprised how far you can go.
References
This content originally appeared on DEV Community and was authored by Atsushi Suzuki