This content originally appeared on DEV Community and was authored by Avisek Dey
A practical guide to running a private, GPU-accelerated coding assistant locally using Docker Desktop — no API costs, no data leaving your machine.
Setup time: ~30–60 minutes
My Story
I’m a professional Django developer based in India.
Like most developers, I was using cloud-based AI tools for coding assistance — paying for API credits, sending my code to third-party servers, and depending on a stable internet connection just to get a code suggestion.
Then one day I was exploring Docker Desktop and noticed two new things in the sidebar — Models and MCP Toolkit.
I had no idea what they were.
A few hours of tinkering later, I had a fully local AI coding assistant running on my laptop:
Free
Private (no code leaves my machine)
Works offline
GPU-accelerated (~273ms response time)
No GitHub Copilot subscription. No API costs.
This is exactly how I built it — including all the mistakes I made along the way.
Before We Start (Important Reality Check)
Let’s be honest:
This is NOT a perfect replacement for GitHub Copilot.
Cloud models (like GPT-4/5-level) are still:
- better at reasoning
- better at large codebases
- more consistent
But…
For most day-to-day coding tasks, a local setup like this is:
- fast enough
- smart enough
- and WAY more private
Think of this as a practical alternative, not a 1:1 replacement.
Why Run AI Locally?
| Problem with Cloud AI | Local AI Solution |
|---|---|
Costly at scale ($10–30/million tokens) |
Completely free |
Needs internet |
Works offline |
Code sent to third-party servers |
Stays on your machine |
Network latency |
GPU-accelerated (~273ms) |
My Setup
This guide is based on my machine — adjust based on your hardware.
| Component | Spec |
|---|---|
Laptop |
Lenovo IdeaPad Pro 5 |
CPU |
Intel Core Ultra 9 185H |
GPU |
6GB NVIDIA |
RAM |
32GB |
OS |
Windows 11 |
Minimum Setup Recommendation
| Hardware | What to Expect |
|---|---|
| No GPU | Works, but slow (~2–5s responses) |
| 8GB RAM | Very limited models only |
| 16GB RAM | Usable |
| 32GB + GPU | Ideal |
Step 1 — Install Docker Desktop
Install Docker Desktop with AI features enabled:
https://www.docker.com/products/docker-desktop
Make sure you see Models and MCP Toolkit in the left sidebar.
Step 2 — Understanding Model Selection
Before pulling any model, you need to understand three things. This took me a while to figure out — so I’ll keep it quick.
Parameters = Brain Size
7B → Good for most coding tasks ✅
30B → Needs 16GB+ VRAM
70B → High-end machines only
Quantization = Compression
Think of it like image compression — smaller file, slight quality trade-off.
F16 → Full precision, largest file
Q4_0 → 4x compressed, best balance ✅
Q2 → Smallest, noticeable quality loss
The Golden Rule — RAM vs VRAM
Fits in VRAM (6GB)? → GPU ⚡ ~273ms
Spills to RAM? → CPU 🐢 ~3000ms
Too big for RAM? → ❌ Won't run
Step 3 — Pull the Right Model
My Pick: qwen2.5:7B-Q4_0
| Property | Value | Why |
|---|---|---|
| Parameters | 7.62B | Smart enough for coding |
| Quantization | Q4_0 | 4x compressed, great quality |
| Size | 4.12 GiB | Fits perfectly in 6GB VRAM ![]() |
Steps:
- Docker Desktop → Models
- Search
qwen2.5 - Find
qwen2.5:7B-Q4_0→ click Pull
~4GB download. Grab a coffee.
Step 4 — Enable GPU-Accelerated Inference (CRITICAL)
Most guides miss this step entirely.
By default Docker Model Runner uses CPU only. One checkbox changes everything.
Go to: Docker Desktop → Settings → AI
Enable all three:
Enable Docker Model Runner
Enable host-side TCP support → Port: 12434
Enable GPU-backed inference
Click Apply.
GPU inference downloads additional components — takes a few minutes the first time.
Verify It’s Working
Open your browser:
http://localhost:12434/engines/v1/models
You should see:
{
"object": "list",
"data": [
{
"id": "docker.io/ai/qwen2.5:7B-Q4_0",
"object": "model",
"owned_by": "docker"
}
]
}
Check Response Speed
Go to Docker Desktop → Models → Requests tab after sending a prompt:
| Mode | Speed |
|---|---|
GPU ![]() |
~273ms |
CPU ![]() |
~3000ms |
That’s a 10x speedup from one checkbox.
Step 5 — Connect VS Code via Continue.dev
Docker Models exposes a local OpenAI-compatible API at:
http://localhost:12434/engines/v1
Key insight: any tool that supports OpenAI’s API works here — just change the URL from OpenAI’s server to localhost.
Install Continue.dev
- VS Code → Extensions (
Ctrl+Shift+X) - Search
Continue - Install Continue – open-source AI code agent
Configure It
Open C:\Users\<yourname>\.continue\config.yaml and paste:
name: Local Config
version: 1.0.0
schema: v1
models:
- name: Qwen2.5 Coder Local
provider: openai
model: ai/qwen2.5:7B-Q4_0
apiBase: http://localhost:12434/engines/v1
apiKey: docker
Why
provider: openai? Docker’s API speaks the OpenAI protocol — same language, different address.
Why
apiKey: docker? Just a placeholder — localhost needs no real auth.
Windows PowerShell Fix (if needed)
If you hit a script execution error:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
Test It!
If it responds — you’re done.
Real Comparison — Copilot vs Local AI
Prompt I tested:
Write a Django REST Framework viewset for a User model
with JWT authentication and permission classes
GitHub Copilot output:
Clean, complete, production-ready code with proper imports, docstrings, and edge case handling. Roughly 60 lines, zero follow-up needed.
Local Qwen 7B output:
from rest_framework import viewsets, permissions
from rest_framework_simplejwt.authentication import JWTAuthentication
from .models import User
from .serializers import UserSerializer
class UserViewSet(viewsets.ModelViewSet):
queryset = User.objects.all()
serializer_class = UserSerializer
authentication_classes = [JWTAuthentication]
permission_classes = [permissions.IsAuthenticated]
def get_queryset(self):
# Users can only see their own data
return User.objects.filter(id=self.request.user.id)
Solid, functional, correct — but less complete than Copilot. Needed a follow-up prompt for edge cases.
Verdict:
| Feature | Copilot | Local Qwen 7B |
|---|---|---|
| Speed | Fast | Fast (GPU ) |
| Boilerplate | Excellent | Good |
| Reasoning | Strong | Moderate |
| Multi-file context | Better | Limited |
| Cost | $10–19/mo | FREE |
| Privacy | External servers | Your machine |
Architecture (Mental Model)
┌─────────────────────────────────────────────┐
│ YOUR MACHINE │
│ │
│ ┌─────────────────┐ │
│ │ Docker Models │ ← qwen2.5:7B-Q4_0 │
│ │ (AI Brain) 🧠 │ runs on your GPU │
│ └────────┬────────┘ │
│ │ exposes │
│ ▼ │
│ ┌─────────────────┐ │
│ │ localhost:12434│ ← OpenAI-compatible │
│ │ REST API 🔌 │ just like -p 8080 │
│ └────────┬────────┘ │
│ │ connects to │
│ ▼ │
│ ┌─────────────────┐ │
│ │ VS Code │ ← Continue.dev │
│ │ (Your IDE) 💻 │ extension │
│ └─────────────────┘ │
└─────────────────────────────────────────────┘
No internet. No API costs. No data leaks.
Where This Falls Short
Be honest with yourself:
Not as smart as GPT-4-level models
Limited context window (struggles with large codebases)
Needs decent hardware for best results
Setup takes 30–60 minutes vs just paying for Copilot
When You Should NOT Use This
- Working on large enterprise codebases
- Need best-in-class reasoning (GPT-4 level)
- Want zero setup / plug-and-play
- Low-end hardware (<16GB RAM, no GPU)
Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
Connection error |
TCP not enabled | Docker Desktop → Settings → AI → Enable host-side TCP |
| Slow responses (>2s) | GPU not enabled | Docker Desktop → Settings → AI → Enable GPU-backed inference |
npx script error |
PowerShell policy | Run Set-ExecutionPolicy RemoteSigned as Admin |
| Model not showing | Not pulled | Docker Desktop → Models → Pull qwen2.5:7B-Q4_0 |
What’s Next?
This is just the foundation.
Docker’s MCP Toolkit can let your local AI actually act — read your codebase, modify files, understand requirements. That’s a full agent setup, and I’ll cover it in Part 2.
Final Thoughts
This setup won’t replace Copilot for everyone.
But if you care about privacy, cost, and full control over your tools — it’s absolutely worth the 30 minutes to set up.
If you’re running this setup (or planning to), I’d love to hear:
What hardware are you using?
Let’s compare setups 
Check out my knowledge vault where I document everything I learn hands-on:
https://github.com/Riju007/dev-knowledge-vault
March 2026 |
Docker Desktop AI features
This content originally appeared on DEV Community and was authored by Avisek Dey
Costly at scale ($10–30/million tokens)
Needs internet
Code sent to third-party servers
Laptop
GPU
RAM
OS
Ideal
~4GB download. Grab a coffee.

