This content originally appeared on DEV Community and was authored by Pshtiwan Mahmood
Solving the Flutter did_send
Crash: A Deep Dive into Isolate Race Conditions
TL;DR: A race condition in Flutter isolate communication was causing fatal
did_send
crashes in our production sports app. The fix required async coordination, proper cleanup sequencing, and a critical 100ms delay. Here’s how we solved it.
The Problem: A Production Nightmare
Imagine this scenario: Your Flutter app is running smoothly in production with thousands of daily users. Everything works perfectly until users start reporting random crashes when navigating to specific screens. The crash logs show a cryptic, terrifying error:
[FATAL:flutter/lib/ui/window/platform_message_response_dart_port.cc(53)]
Check failed: did_send.
This wasn’t just any crash—it was a fatal engine error that would completely terminate the app. No graceful error handling, no recovery, no second chances. Just instant death.
The worst part? The error message gave us absolutely no context about what was causing it.
The App: Real-Time Sports Data at Scale
Our app is a comprehensive sports statistics platform serving thousands of users. It provides:
Live match scores updated every 5 seconds
Real-time competition standings
Player and team statistics
Multi-language support
The real-time functionality is the heart of our app, powered by Flutter isolates that run background timers, fetch data from APIs, and stream updates to the main UI thread.
The Architecture
// Two main services handling real-time data
class RealtimeDataService {
// Handles general sports data (all games)
// Updates every 5 seconds via isolate
}
class SingleRealtimeDataService {
// Handles specific game data
// Updates every 5 seconds via isolate
}
The Investigation: Following the Crash Trail
Step 1: Understanding the did_send
Error
After diving deep into Flutter’s source code, I discovered that the did_send
error occurs when:
- An isolate tries to send a message through a
SendPort
- The receiving
ReceivePort
has already been closed/disposed - Flutter engine’s safety check fails:
did_send = false
FATAL CRASH
Step 2: Reproducing the Issue
The crash had a very specific pattern. It happened when users:
Opened a modal dialog
Clicked on a list item
Navigated to a detail screen
This navigation sequence triggered our router observer, which managed the real-time services based on the current route.
Step 3: Finding the Smoking Gun
Here’s the problematic code that was causing our crashes:
// 🔴 PROBLEMATIC CODE (Before fix)
void setRealtimeBaseOnRoute(Route route) {
if (screen?.name == TargetRoute.name) {
singleRealtimeDataService.stop(); // Closes ReceivePort immediately
realtimeDataService.stop(); // Closes ReceivePort immediately
realtimeDataService.start(); // Starts new isolate immediately
}
}
The Race Condition Explained:
-
stop()
kills the isolate and closesReceivePort
-
start()
creates a new isolate immediately - But the old isolate hadn’t finished cleanup yet!
- Old isolate timer fires → tries to send to CLOSED port → CRASH
The Solution: Proper Async Coordination
Fix #1: Make Operations Asynchronous
The key insight was that isolate lifecycle management needed to be properly coordinated with async/await:
// ✅ FIXED CODE (After solution)
void setRealtimeBaseOnRoute(Route route) async {
try {
ScreenDetailData? screen = getScreen(route);
if (screen?.name == TargetRoute.name) {
// Wait for complete cleanup before proceeding
await singleRealtimeDataService.stop();
await realtimeDataService.stop();
// 🔧 CRITICAL: Wait for isolate cleanup
await Future.delayed(const Duration(milliseconds: 100));
await realtimeDataService.start(shouldStart: true);
}
} catch (e) {
developer.log('Error in route management: $e');
// Don't rethrow to prevent app crashes
}
}
Fix #2: Enhanced Service Architecture
I also refactored the isolate services to be more robust:
class RealtimeDataService {
ReceivePort? _receiver;
Isolate? _isolate;
bool _isStarted = false;
bool _isStarting = false; // 🔧 Prevents race conditions
Future<void> stop() async {
if (!_isStarted && !_isStarting) return;
try {
_isStarting = false;
// Kill isolate first
_isolate?.kill(priority: Isolate.immediate);
_isolate = null;
// Close receive port
_receiver?.close();
_receiver = null;
_isStarted = false;
} catch (e) {
developer.log('Error stopping service: $e');
}
}
Stream<dynamic> get dataStream {
if (_receiver == null) return const Stream.empty();
return _receiver!;
}
}
Fix #3: The Magic 100ms Delay
You might wonder: “Why 100ms? Isn’t that arbitrary?”
Not at all! This delay is crucial because it gives the Flutter engine enough time to:
Complete isolate termination
Close all message ports
Clean up platform channels
Ensure no orphaned messages
The Technical Deep Dive: Timeline Analysis
Before the Fix (Race Condition):
⏰ Time 0ms: User navigates
⏰ Time 1ms: Navigation starts
⏰ Time 2ms: stop() called → ReceivePort closes
⏰ Time 3ms: start() called → New ReceivePort opens
⏰ Time 4ms: Old isolate timer fires → tries to send to CLOSED port
⏰ Time 5ms: 💥 CRASH: did_send check fails
After the Fix (Proper Sequencing):
⏰ Time 0ms: User navigates
⏰ Time 1ms: Navigation starts
⏰ Time 2ms: await stop() → ReceivePort closes + isolate cleanup
⏰ Time 50ms: Old isolate fully terminated
⏰ Time 100ms: Safety delay complete
⏰ Time 101ms: await start() → New ReceivePort opens
⏰ Time 102ms: ✅ SUCCESS: Clean transition
Lessons Learned
1.
Isolate Communication is Fragile
Platform message channels in Flutter are low-level and unforgiving. Always ensure proper cleanup sequencing.
2.
Async/Await Saves Lives
What seemed like a simple synchronous operation actually required careful async coordination.
3.
Production Errors Need Deep Investigation
The did_send
error gave no context about the root cause. Only systematic investigation revealed the race condition.
4.
Error Handling is Critical
Always wrap isolate operations in try-catch blocks to prevent crashes from propagating.
The Results
After implementing these fixes:
Zero crashes related to the
did_send
errorSmooth navigation between all screens
Robust error handling prevents future isolate issues
Production stability with thousands of daily users
Key Takeaways for Flutter Developers
- Always use async/await when managing isolate lifecycles
- Add delays between stop/start operations to ensure cleanup
- Implement comprehensive error handling for isolate operations
- Test navigation flows thoroughly in production-like conditions
- Monitor crash logs for platform-level errors
Final Thoughts
Debugging production crashes can be incredibly challenging, especially when the error messages are cryptic. This experience taught me the importance of:
- Systematic investigation over quick fixes
- Understanding the underlying platform (Flutter engine internals)
- Proper async coordination in complex systems
- Comprehensive error handling to prevent cascading failures
The did_send
crash was a reminder that even small race conditions can have catastrophic effects in production. But with the right approach, even the most mysterious bugs can be solved.
Discussion
Have you encountered similar isolate issues in your Flutter apps? What debugging strategies worked for you? Share your experiences in the comments below!
What would you like to see next?
- More Flutter debugging deep dives?
- Performance optimization techniques?
- Real-time app architecture patterns?
Let me know!
If this helped you solve a similar issue, please give it a and share it with your fellow Flutter developers!
This content originally appeared on DEV Community and was authored by Pshtiwan Mahmood