Linux, macOS, and Windows broke my code, so I built a cross-platform test strategy



This content originally appeared on DEV Community and was authored by Asoseil

Cross-platform filesystem events sound simple in theory:

make a file change, receive a notification

In practice, three operating systems interpret, buffer, group, and deliver those notifications in completely different ways. The same action can trigger a single clean event on one platform, several intermediate ones on another, or a delayed burst somewhere else. Getting consistent behaviour across all of them becomes surprisingly fragile, especially when concurrency and real-world workloads enter the picture.

I ran into all of this while building FSWatcher, a file-watching component that needed to behave consistently on every platform. Under the hood it uses FSEvents on macOS, inotify on Linux, and ReadDirectoryChangesW on Windows, so it ended up exposing all the differences between the three systems at once. Once concurrency entered the picture, those differences became even more noticeable, and many assumptions I had initially made didn’t hold up.

Correctness on one platform didn’t imply correctness on another, and correctness under light load didn’t tell me much about behaviour under stress. The test strategy had to become more deliberate, more platform-aware, and more concurrency-resistant. The result is a pipeline that consistently exposes cross-platform issues early and keeps the watcher stable across all three environments.

Cross-Platform behaviour is more different than you expect

All three operating systems surface events in their own way. macOS sometimes groups directory-level changes; Linux produces multiple events for what conceptually feels like “one change”; Windows often emits cleaner, consolidated notifications.

Any assumption about ordering, timing, or granularity breaks immediately.

For example, writing three files in sequence rarely produces events in that same order. Under heavy I/O, events can arrive seconds apart or all at once. This means tests can’t focus on sequences, they have to focus on eventual consistency and presence of the expected state transitions.

FSWatcher also supports runtime path addition and removal, and this affects the testing model: the suite needs to confirm that paths being watched dynamically start emitting events immediately, and removed paths truly stop. This pushes the tests into scenarios where the watcher has to respond to concurrent updates to its own internal structures.

CI setup that doesn’t hide platform behaviours

The current CI pipeline runs the full suite on macOS (Intel and ARM), Linux, and Windows. It’s intentionally configured not to stop early, because a Windows-only failure is still valuable even if Linux is green.

strategy:
  fail-fast: false
  matrix:
    include:
      - os: macos-26
      - os: macos-latest
      - os: ubuntu-latest
      - os: windows-latest

The Go race detector runs on macOS and Linux and is responsible for catching most of the logic bugs that would otherwise appear as rare, intermittent failures. It has already flagged unsynchronised map writes, channel close races, and struct fields being read while another goroutine was updating them. These are the sorts of concurrency defects that aren’t visible in code review and rarely reproduce manually.

Windows builds skip -race but still run the full suite, which is important because Windows has its own timing behaviours and handles resources differently than Unix-like systems.

A periodic C*odeQL analysis also runs to flag suspicious patterns* and unsafe syscall usage. Even though it isn’t concurrency-specific, it helps ensure that filesystem and OS interactions aren’t quietly failing.

Testing strategy cross-platform

The tests had to change significantly to become stable. The early versions relied on fixed sleeps and assumptions about order. Replacing those with event-driven constructs made the suite consistent across runners.

A readiness channel ensures that the watcher is fully started before operations begin:

select {
case <-readyChan:
case <-time.After(5 * time.Second):
    t.Fatal("Watcher is not ready")
}

Event collection runs concurrently in its own goroutine, which reflects how the watcher behaves in practice. The use of mutex-protected slices keeps the test deterministic without relying on sleep.

One of the most important pieces is the randomized workload generator:

expected := performOperations(t, tempDir, 1000)

performOperations performs a mix of file creation, modification, rename, directory creation, deletion, and nested deletion

Because the operations are randomized, each run produces different event patterns, and this forces the watcher to handle unpredictable sequences, essentially a built-in form of stress testing.

The correctness check doesn’t compare order. It compares the presence of expected paths:

if !receivedEvents[path] {
    missing = append(missing, path)
}

And the tests ensure the watcher stops cleanly, channels close, and no goroutines outlive the test.

FSWatcher also exposes statistics (processed, dropped, lost events). The tests validate that backpressure logic is correct by filling channels intentionally and verifying that dropped events land in the right place and that lost events are accounted for.

Runtime path addition and removal are tested by creating temporary directories, watching multiple roots, and ensuring that removed paths do not emit events even if files are changed afterward.

All of this results in a suite that goes beyond basic unit testing: it simulates bursty workloads, random event order, dynamic watch roots, and concurrent shutdown, exactly the kind of scenarios that real usage hits over time.

What the current tests cover effectively

Even without formal stress tools or property-based frameworks, the suite already exercises several advanced behaviors:

  • randomized file operations simulate realistic filesystem churn

  • concurrent event ingestion tests backpressure and channel behavior

  • dynamic path addition/removal stresses internal synchronization

  • multi-platform CI exposes OS-specific timing and ordering issues

  • shutdown tests verify that goroutines and resources clean up reliably

  • dropped/lost event tests validate overload behaviour

  • watchers handle thousands of events generated through random workloads

This places FSWatcher’s test suite closer to a real integration environment rather than a traditional set of unit tests.

Go further

Since many stress-like behaviours are already present thanks to the randomized workload generator, the next improvements should be focused on areas not yet covered by the existing suite.

A few upgrades would extend the reliability of the watcher in scenarios that aren’t currently exercised:

Long-Running endurance tests

Running the watcher for hours rather than seconds helps expose:

  • slow memory leaks

  • handle/file-descriptor growth over time

  • goroutine buildup

  • timing drift under GC pressure

This could run weekly in CI or manually before releases.

Fault injection with failpoints

Injecting controlled failures into:

  • syscall returns

  • event dispatch paths

  • directory iteration

  • queue overflows

would reveal whether FSWatcher recovers correctly from partial failures. Randomising delays inside goroutines would also help expose race windows that normal tests rarely hit.

These two additions would meaningfully extend the testing scope without overlapping what the current suite already achieves.

Testing concurrent, cross-platform code requires a strategy that embraces inconsistency rather than fighting it. Different event models, different timing guarantees, and different behaviours under load mean the tests must be adaptive rather than prescriptive.

The combination of multi-platform CI, race detection, randomized workloads, event-driven assertions, and strict cleanup has made FSWatcher stable across all three operating systems. With endurance testing and fault injection added on top, the watcher would be able to withstand even more demanding production scenarios.


This content originally appeared on DEV Community and was authored by Asoseil