This content originally appeared on DEV Community and was authored by jefferson otoni lima

I’m very excited to share more improvements implemented in the Quick Framework, developed by @jeffotoni. These updates focus on robustness, performance, and support for modern protocols.

Early mornings improving Quick

As we’re using Quick for an AI (Artificial Intelligence) communication project, several needs have emerged that have led to new implementations and improvements to Quick. So, I’ll present some improvements made this week.

There’s nothing better than putting a framework to the test in practice, and it’s fascinating! I’m using it in AI projects, developing native servers to orchestrate flows with LLMs, creating custom connectors, and continuously building RAG solutions. It’s been a challenging yet very exciting experience.

Reflections from a curious developer

The path is arduous, long, but incredibly enjoyable. I’m driven by curiosity, and sometimes I can’t contain myself. I need to study again, revisit something that was right there with me all along, but only now makes sense, you know? It wasn’t the right moment before, perhaps, but it was there, so close and yet so distant. And this belated discovery intrigues me even more.

Despite years of practice and development, I often feel as if I’m seeing the world of technology through a crack, like someone peeking at the “whole” through a keyhole. Reality seems fragmented; we only access bits and pieces, flashes, never the complete whole, and that’s intriguing. And perhaps we’ll never truly see it.

That’s why I decided to create these posts, not only to show in practice what was done for the Quick framework, but also to share some sincere reflections from a developer constantly learning.

What was implemented?

1. Duplicate WriteHeader Protection

We implemented a custom wrapper (responseWriter) that prevents “superfluous response.WriteHeader” errors. Now, multiple calls to WriteHeader are handled silently, avoiding crashes in complex middleware chains.

2. Hijacker Support

The responseWriter now implements the http.Hijacker interface, allowing connection upgrades natively. This enables real-time bidirectional communication directly through Quick.

3. HTTP/2 Server Push

We added support for the http.Pusher interface with the c.Pusher() method, enabling HTTP/2 Server Push to improve performance by proactively sending resources to the client before they are even requested, reducing latency and round-trips.

Example:

q.Get("/", func(c *quick.Ctx) error {
    pusher, ok := c.Pusher()
    if ok {
        pusher.Push("/static/style.css", nil)
        pusher.Push("/static/app.js", nil)
    }
    return c.Status(200).SendString("<html>...</html>")
})

4. Simplified Server-Sent Events (SSE) for Streaming LLMs

We implemented the c.Flusher() method, which facilitates real-time data streaming, essential for modern applications with Large Language Models (LLMs). It allows you to progressively send tokens as they are generated, creating interactive experiences in both the browser and CLIs.

Real-world use cases:
Streaming responses from ChatGPT, Claude, and Gemini
Real-time deployment logs
Progressive dashboard updates
Server push notifications

Example 1:

q.Post("/ai/chat", func(c *quick.Ctx) error {
    c.Set("Content-Type", "text/event-stream")
    c.Set("Cache-Control", "no-cache")
    c.Set("Connection", "keep-alive")

    flusher, ok := c.Flusher()
    if !ok {
        return c.Status(500).SendString("Streaming not supported")
    }

    // Simulate streaming of tokens the LLM
    tokens := []string{"Hello", " this", " is", " a", " streaming", " response", " from", " AI"}

    for _, token := range tokens {
        // Standard SSE format
        fmt.Fprintf(c.Response, "data: %s\n\n", token)
        flusher.Flush() // Sends immediately to the customer
        time.Sleep(100 * time.Millisecond) // Simulates LLM latency
    }

    // Signals end of stream
    fmt.Fprintf(c.Response, "data: [DONE]\n\n")
    flusher.Flush()
    return nil
})

Example 2:

q.Get("/events", func(c *quick.Ctx) error {
    c.Set("Content-Type", "text/event-stream")
    c.Set("Cache-Control", "no-cache")

    flusher, ok := c.Flusher()
    if !ok {
        return c.Status(500).SendString("Streaming not supported")
    }

    for i := 0; i < 10; i++ {
        fmt.Fprintf(c.Response, "data: Message %d\n\n", i)
        flusher.Flush()
        time.Sleep(time.Second)
    }
    return nil
})

Client JavaScript (Browser):

const eventSource = new EventSource('/ai/chat');
eventSource.onmessage = (event) => {
    if (event.data === '[DONE]') {
        eventSource.close();
        return;
    }
    document.getElementById('response').innerText += event.data;
};

Client CLI (Go):

resp, _ := http.Post("http://localhost:8080/ai/chat", "application/json", body)
reader := bufio.NewReader(resp.Body)
for {
    line, _ := reader.ReadString('\n')
    if strings.Contains(line, "[DONE]") {
        break
    }
    fmt.Print(strings.TrimPrefix(line, "data: "))
}

It works perfectly with HTTP/2 multiplexing, allowing multiple simultaneous streams on the same connection.

5. Pooling Optimization

The Reset() method has been optimized to reuse the existing wrapper instead of recreating it with each request. This reduces memory allocations in the hot path, improving throughput.

6. Memory Leak Prevention

We implemented full context cleanup in releaseCtx(), including Context and wroteHeader fields, ensuring that no residual state remains between requests.

Impact

Greater robustness in high-concurrency scenarios
Native HTTP/2 support
Server-Sent Events (SSE) made easy with c.Flusher()
Reduced allocations per request
Zero breaking changes and 100% backward compatibility

All changes maintain full compatibility with existing code.

SSE vs WebSocket

Appearance	SSE	WebSocket
Server CPU	Low	Medium
Server Memory	2-4 KB/connection	8-16 KB/connection
Bandwidth	Lower overhead	Higher overhead
Latency	~50 ms	~5-10 ms
Implementation	Simple	Complex
Debugging	HTTP Tools	Specific Tools
Firewall/Proxy	Standard HTTP	Power Issues
Bidirectional	No (server-client only)	Yes
Protocol	HTTP/1.1 or HTTP/2	WebSocket (RFC 6455)
Parser	Plain Text	Binary Frames
Handshake	Normal HTTP Request	HTTP Refresh
Automatic Reconnect	Yes (native)	Manual
Browser Support	All modern	All modern
Message Overhead	~45 bytes	~50+ bytes
Ideal for	Notifications, feeds, logs	Chat, games, collaboration
Scalability	I have ~10,000 connections	Thousands of Connections
CDN Compatible	Yes	Limited
Backend Complexity	Low	High

Performance Comparison – SSE Writing Methods

This benchmark was run to identify the most efficient method for detecting SSE events unrelated to http.ResponseWriter. Tests were performed on an Apple M3 Max with Go 1.x, measuring nanoseconds per operation (ns/op), bytes allocated (B/op), and number of allocations (allocs/op).

Benchmark Results

The w.Write([]byte()) method performs best with 13.56 ns/op and zero allocations, approximately 4x faster than fmt.Fprint() and 9x faster than io.WriteString().

For large messages (>1 KB), it is recommended to use sync.Pool to reuse buffers, reducing allocation and putting pressure on the garbage collector.

Recommendations

Development/Debugging: Use fmt.Fprint() for simplicity.
Production (small messages): Use w.Write([]byte()) for maximum performance.
Production (large messages): Use sync.Pool with reused buffers.
High performance (>10,000 requests/s): Combine w.Write() with a buffer pool.

The full benchmark is available in /bench.

Method	Performance	Allocations	Complexity	Recommendation
`fmt.Fprint()`	Medium (53 ns)	3 allocations	Simple	Development
`io.WriteString()`	Slow (116 ns)	1 allocation	Simple	Avoid
`w.Write([]byte)`	Excellent (13 ns)	0 allocations	Simple	Throughput
`strings.Builder`	Slow (124 ns)	1-2 allocations	Media	Avoid
`Multiple Writes`	Good (21 ns)	0 allocations	Simple	Alternative
`sync.Pool`	Excellent (40 ns)	0 allocations	Complex	High-performance
`Unsafe`	Excellent (21 ns)	0 allocations	Complex	Experts

Running the Benchmark

To reproduce the performance tests and validate the results on your machine, run:

go test -bench=. -benchtime=1s -benchmem ctx_bench_test.go

Benchmark Parameters

-bench=. – Run all benchmarks
-benchtime=1s – Run each benchmark for 1 second
-benchmem – Include memory allocation statistics

Benchmark Results

Test Environment:

OS: macOS (darwin)
Architecture: ARM64
CPU: Apple M3 Max
Go Version: 1.x

Method	ns/op	ops/sec	B/op	allocs/op	Performance
`WriteBytes`	13.32	75.1M	0	0	Winner
`MultipleWrites`	20.68	48.4M	0	0	Excellent
`Unsafe`	20.98	47.7M	0	0	Great
`Pooled`	39.00	25.6M	0	0	Good
`FmtFprint`	52.69	19.0M	16	1	Slow
`FmtFprintf`	62.61	16.0 million	16	1	Slow
`IoWriteString`	111.6	9.0 million	1024	1	Very slow
`Optimized`	119.7	8.4 million	1024	1	Very slow
`StringsBuilder`	122.7	8.2 million	1032	2	Very slow

Method	ns/op	Speedup	B/op	allocs/op	Performance
`PooledLarge`	234.5	Baseline	0	0	Winner
`WriteBytesLarge`	811.2	3.46x slower	9472	1	Much Slower

Examples and Source Code

All tests can be accessed here -> Quick SSE

Here you can view the Bench source code bench

Contributions

Quick is an open-source project in constant evolution. Feedback and contributions are always welcome!
GitHub: Quick

golang #webframework #performance #opensource #go #quick