This content originally appeared on DEV Community and was authored by Clay Roach
Day 6: When Protobuf Breaks Everything
The Plan: Add real-time updates and bootstrap AI anomaly detection.
The Reality: “Why are all my operations named ‘protobuf-fallback-trace’?!”
Welcome to Day 6 of building an AI-native observability platform in 30 days. Today was supposed to be about sexy features. Instead, it was about the unglamorous reality of systems engineering: making protobuf work correctly.
The Problem That Changed Everything
I started the day confident. The OpenTelemetry demo was running, traces were flowing, the UI was displaying data. Time to add real-time updates, right?
Then I looked closer at the trace details:
// What I expected:
{
service: "CartService",
operation: "AddItemToCart",
duration: 125
}
// What I got:
{
service: "CartService",
operation: "protobuf-fallback-trace", // 😱
duration: 50
}
Every. Single. Operation. Was named “protobuf-fallback-trace”.
The Investigation Begins
Discovery #1: Gzip Was Being Ignored
The OpenTelemetry demo sends protobuf data with gzip compression. My middleware had “clever” conditional logic:
// The broken approach
app.use('/v1*', (req, res, next) => {
if (req.headers['content-type']?.includes('protobuf')) {
// Special handling that SKIPPED gzip decompression 🤦
express.raw({ type: 'application/x-protobuf' })(req, res, next)
} else {
express.json()(req, res, next)
}
})
The fix was embarrassingly simple:
// The working approach
app.use('/v1*', express.raw({
limit: '10mb',
type: '*/*',
inflate: true // THIS enables gzip decompression for ALL content types
}))
Lesson: Sometimes “clever” code is just complicated code. Unified handling often beats conditional logic.
Discovery #2: Protobufjs vs ES Modules
Next challenge: parsing the actual protobuf data. The protobufjs library is CommonJS, but my project uses ES modules. This led to hours of:
// Attempt 1: Named imports (doesn't work)
import { load } from 'protobufjs' // ❌ "Named export 'load' not found"
// Attempt 2: What actually works
import pkg from 'protobufjs'
const { load } = pkg // ✅
Discovery #3: Path Resolution Hell
Even with protobufjs loading, the OTLP protobuf definitions have imports that need custom resolution:
// The protobuf loader that finally worked
const { Root } = pkg
this.root = new Root()
this.root.resolvePath = (origin: string, target: string) => {
// Custom resolution for OTLP imports
if (target.startsWith('opentelemetry/')) {
return path.join(protoPath, target)
}
return path.resolve(path.dirname(origin), target)
}
The Nuclear Option: Enhanced Fallback Parsing
When the “proper” protobuf parsing kept failing, I built something unconventional – a raw protobuf parser that extracts data through pattern matching:
function parseOTLPFromRaw(buffer: Buffer): any {
const data = buffer.toString('latin1')
// Extract service names by pattern
const serviceMatches = [...data.matchAll(
/service\.name[\x00-\x20]*([a-zA-Z][a-zA-Z0-9\-_]+)/g
)]
// Extract operation names
const operationCandidates = operationMatches
.map(match => match[1])
.filter(op =>
op.length > 3 &&
!op.match(/^[0-9a-f]+$/) && // Skip hex strings
(op.includes('.') || op.includes('/') || op.includes('_'))
)
// Build spans from extracted data
return {
resourceSpans: [{
resource: {
attributes: [
{ key: 'service.name', value: { stringValue: serviceName }}
]
},
scopeSpans: [{
spans: operationCandidates.map(op => ({
name: op, // Real operation names!
// ... rest of span data
}))
}]
}]
}
}
Is this elegant? No. Does it work? Absolutely.
The Results
After 8 hours of protobuf wrestling:
Before:
All operations: “protobuf-fallback-trace”
1 fake span per trace
No real telemetry data
After:
Real operations:
oteldemo.AdService
,CartService.AddItem
10+ real spans per trace
Authentic resource attributes and timing data
Key Learnings
1. Fallback Strategies Are Not Defeat
Building a fallback parser wasn’t giving up – it was ensuring the system works even when dependencies fail. In production, working beats perfect.
2. Debug at the Lowest Level
I spent hours assuming the protobuf data was corrupt. Finally logging the raw buffer bytes revealed it was fine – the decompression was being skipped.
3. Integration Points Are Where Systems Break
The individual components all worked:
OpenTelemetry demo: sending valid data
Express server: receiving requests
ClickHouse: storing data
The failure was in the glue between them.
4. Real Data Reveals Real Problems
Mock data would never have exposed this issue. Testing with the actual OpenTelemetry demo forced me to handle real-world complexity.
The Bigger Picture
Today didn’t go according to plan, and that’s exactly what building production systems is like. The glossy demo videos don’t show the 8 hours spent debugging why protobuf.load is not a function
.
But here’s what matters: the system now correctly processes thousands of real traces from a production-like demo application. Every service is visible, every operation is named correctly, and the data flowing through the pipeline is authentic.
What’s Next (Day 7)
Now that protobuf parsing actually works:
- Implement the real-time updates (for real this time)
- Add WebSocket support for live trace streaming
- Bootstrap the AI anomaly detection system
- Create service dependency visualization
Code Snippets That Saved the Day
For anyone fighting similar battles:
# Debug protobuf data in container
docker compose exec backend xxd -l 100 /tmp/trace.pb
# Test gzip decompression
curl -X POST http://localhost:4319/v1/traces \
-H "Content-Type: application/x-protobuf" \
-H "Content-Encoding: gzip" \
--data-binary @trace.pb.gz
# Check what protobufjs actually exports
node -e "console.log(Object.keys(require('protobufjs')))"
Conclusion
Day 6 was humbling. The plan was to build flashy features. Instead, I spent the day in the trenches making basic data ingestion work correctly.
But that’s real engineering. It’s not always about the elegant algorithm or the clever architecture. Sometimes it’s about making protobuf parsing work at 2 AM because your entire platform depends on it.
The platform is stronger because of today’s battles. And tomorrow, with real data flowing correctly, we can build the features that actually matter.
Are you fighting your own protobuf battles? Share your war stories in the comments. Sometimes knowing you’re not alone in the debugging trenches makes all the difference.
Progress: Day 6 of 30 | Protobuf: Finally Working | Sanity: Questionable
GitHub Repository | Follow the Journey
This content originally appeared on DEV Community and was authored by Clay Roach