Span<T> and Memory<T> in C#: Complete Guide to Zero-Allocation Performance in .NET – ██FR█████ █INTELL███████████

This content originally appeared on Level Up Coding – Medium and was authored by Emanuele Longo

Span<T> and Memory<T>: The memory management revolution in .NET

In the world of high-performance development, every memory allocation comes at a cost. In the past, to achieve optimal performance, .NET developers often had to resort to unsafe code or complex object pooling strategies. The introduction of Span<T> and Memory<T> in .NET Core radically changed this landscape, providing type-safe tools to manage memory as efficiently as native code.

Span<T> and Memory<T>

These types aren’t just wrappers around existing arrays; they represent a fundamental new abstraction that allows you to create “views” across contiguous blocks of memory without additional allocations. The result is code that maintains the safety and readability of .NET while eliminating the performance penalties typical of intensive garbage collection.

Tech Fundamentals

Ref Struct vs Reference Types: The architecture of Span<T>

The decision to implement Span<T> as a ref struct wasn’t a random one, but stems from specific architectural considerations that determine both its advantages and limitations. A ref struct is a type that can live exclusively on the stack, can’t be boxed, can’t be a class field, and can’t be captured by lambdas or asynchronous methods.

public readonly ref struct Span<T>
{
    private readonly ref T _pointer;
    private readonly int _length;
}

This minimal structure contains only a reference to the first element and the length of the sequence. There are no managed heap pointers, additional metadata, or object headers. The compiler and runtime ensure that this structure can never outlive the stack frame that created it, thus eliminating any possibility of dangling references.

The consequence of this architecture is twofold: on the one hand, we get native performance with zero allocation overhead, and on the other, we accept limitations that prevent some common patterns in traditional .NET. For example, we can’t store a Span<T> in a list or pass it to an asynchronous method.

// Warning! This code does not compile.
public class Container
{
    private Span<int> _data; // Error: ref struct cannot be a field
}
// Warning! This doesn't either.
public async Task ProcessAsync()
{
    var span = stackalloc int[100];
    await SomeAsyncOperation(); // Error: Cannot await after Span<T> or stackalloc
}

This is where Memory<T> comes in. This structure is a traditional type that can live on the heap and internally maintains a safe reference to data, whether it’s in an array, native memory, or other backing stores. Memory<T> serves as a “handle” that can be passed across asynchronous boundaries and then converted to Span<T> when the data actually needs to be accessed.

Memory Layout and Contiguous Data: The foundation of performance

The power of Span<T> comes from its ability to unify access to different types of contiguous memory under a common interface. Whether the data resides in a heap-allocated array, in stack-allocated memory using stackalloc, in native memory allocated using Marshal.AllocHGlobal, or even in memory-mapped files, Span<T> features the same uniform API.

This unification has profound architectural implications. Algorithms written to operate on Span<T> can work indifferently on data from completely different sources, without code modifications and without performance penalties. The compiler can optimize element accesses as if they were direct native memory accesses.

public void ProcessData(Span<byte> data)
{
    // Same code, regardless of data source
    for (int i = 0; i < data.Length; i++)
    {
        data[i] = (byte)(data[i] * 2);
    }
}

// Works with arrays
var array = new byte[1000];
ProcessData(array);

// Works with stack memory
Span<byte> stackData = stackalloc byte[1000];
ProcessData(stackData);

// Works with native memory
unsafe
{
    byte* nativePtr = (byte*)Marshal.AllocHGlobal(1000);
    var nativeSpan = new Span<byte>(nativePtr, 1000);
    ProcessData(nativeSpan);
    Marshal.FreeHGlobal((IntPtr)nativePtr);
}

Data contiguity is crucial for modern performance. Today’s processors are optimized for sequential accesses that leverage L1/L2/L3 cache, and Span<T> ensures that data is always physically contiguous in memory. This is a significant advantage over structures like jagged arrays or reference lists, where accesses can cause frequent cache misses.

Zero-Copy Operations: Eliminating unnecessary allocations

The zero-copy concept is at the heart of Span<T>’s efficiency. In many traditional operations, data manipulation requires the creation of intermediate copies, which not only consume additional memory but also put pressure on the garbage collector.

Consider the operation of splitting a string. The traditional approach with String.Split() creates an array of strings, where each string is a new allocation containing a copy of the original characters. With Span<T>, however, we can create “views” of the original characters without ever copying them.

The slice operation is crucial in this context. When we call span.Slice(start, length), we’re not copying data but creating a new Span<T> that points to a portion of the original memory. This operation costs O(1) and requires zero allocations, regardless of the size of the data involved.

public ReadOnlySpan<char> FindWord(ReadOnlySpan<char> text, int wordIndex)
{
    int currentWord = 0;
    int start = 0;
    
    for (int i = 0; i <= text.Length; i++)
    {
        if (i == text.Length || char.IsWhiteSpace(text[i]))
        {
            if (currentWord == wordIndex)
            {
                // Zero allocations - just a view of the original data
                return text.Slice(start, i - start);
            }
            currentWord++;
            start = i + 1;
        }
    }
    
    return ReadOnlySpan<char>.Empty;
}

This ability to create views at no additional cost opens the door to entirely new algorithmic patterns, where we can decompose complex problems into subproblems that operate on portions of data without ever losing the efficiency of direct access.

Practical Use Cases

High Performance String Processing

String processing is probably the area where Span<T> shows the most obvious benefits. Traditional operations like parsing, tokenization, and formatting often require multiple temporary allocations, which can significantly impact performance in high-throughput scenarios.

Parsing structured formats is a prime example. Consider parsing a timestamp in ISO 8601 format. The traditional approach would require creating substrings for each component (year, month, day, etc.), whereas with Span<T> we can operate directly on the original characters.

public bool TryParseIsoDateTime(ReadOnlySpan<char> input, out DateTime result)
{
    result = default;
    
    // Length validation without substrings
    if (input.Length != 19 || input[4] != '-' || input[7] != '-' || 
        input[10] != 'T' || input[13] != ':' || input[16] != ':')
        return false;
    
    // Parse each component directly from the original span
    if (!int.TryParse(input.Slice(0, 4), out int year) ||
        !int.TryParse(input.Slice(5, 2), out int month) ||
        !int.TryParse(input.Slice(8, 2), out int day) ||
        !int.TryParse(input.Slice(11, 2), out int hour) ||
        !int.TryParse(input.Slice(14, 2), out int minute) ||
        !int.TryParse(input.Slice(17, 2), out int second))
        return false;
    
    try
    {
        result = new DateTime(year, month, day, hour, minute, second);
        return true;
    }
    catch
    {
        return false;
    }
}

This approach completely eliminates intermediate allocations while maintaining clear and readable logic. In scenarios where we need to parse thousands of timestamps per second, the difference in allocations and garbage collector performance is dramatic.

The pattern naturally extends to the processing of more complex formats like CSV, JSON, or network protocols. The ability to navigate data with efficient slicing operations allows for the implementation of parsers that maintain native performance while remaining within safe code.

Buffer Management in Network-Intensive Applications

Applications that handle heavy network traffic, such as web servers or message brokers, spend a significant portion of their time managing buffers. Traditionally, this meant allocating arrays of bytes for each connection or request, creating constant pressure on the garbage collector and potential performance bottlenecks.

Span<T> allows for much more sophisticated buffer management. We can allocate large blocks of memory and then distribute views of portions of those blocks to different operations, eliminating the need for per-request allocations.

A concrete example is handling HTTP requests. Instead of allocating a new buffer for each request, we can use a large buffer pool and distribute spans representing the portions needed for each request’s header, body, and metadata.

public class HttpRequestProcessor
{
    private readonly ArrayPool<byte> _bufferPool = ArrayPool<byte>.Shared;
    
    public async Task<HttpRequest> ProcessRequestAsync(Stream networkStream)
    {
        byte[] buffer = _bufferPool.Rent(8192);
        try
        {
            int totalBytesRead = 0;
            int bytesRead;
            
            // Fill the buffer incrementally
            do
            {
                bytesRead = await networkStream.ReadAsync(
                    buffer.AsMemory(totalBytesRead, buffer.Length - totalBytesRead));
                totalBytesRead += bytesRead;
            } while (bytesRead > 0 && !IsRequestComplete(buffer.AsSpan(0, totalBytesRead)));
            
            // Parse without further allocations
            return ParseHttpRequest(buffer.AsSpan(0, totalBytesRead));
        }
        finally
        {
            _bufferPool.Return(buffer);
        }
    }
    
    private HttpRequest ParseHttpRequest(ReadOnlySpan<byte> requestData)
    {
        // Find the separation between headers and body
        var headerSeparator = FindSequence(requestData, "\r\n\r\n"u8);
        if (headerSeparator == -1) throw new InvalidOperationException("Invalid HTTP request");
        
        var headersSpan = requestData.Slice(0, headerSeparator);
        var bodySpan = requestData.Slice(headerSeparator + 4);
        
        // Parse headers by operating directly on the original bytes
        return new HttpRequest
        {
            Headers = ParseHeaders(headersSpan),
            Body = bodySpan.ToArray() // Only here we allocate, if necessary
        };
    }
}

This pattern dramatically reduces per-request allocations and allows for significantly higher throughput in heavily loaded applications. The key is to separate memory management (via pools) from processing logic (via span views).

High-Performance Algorithms with Memory Views

Algorithms that operate on large datasets benefit greatly from Span<T>’s ability to provide efficient views of portions of data. This is particularly evident in divide-and-conquer algorithms, where the recursive decomposition of the problem can be implemented via slicing operations with no overhead.

Consider a sorting algorithm like QuickSort. The traditional version would require the creation of new arrays for the recursive partitions, or complex index calculations to operate in-place. With Span<T>, we can implement a clean version that operates directly on the original data views.

public static void QuickSort<T>(Span<T> data) where T : IComparable<T>
{
    if (data.Length <= 1) return;
    
    // Partition data in-place
    int pivotIndex = Partition(data);
    
    // Recursion on original data views - zero allocations
    QuickSort(data.Slice(0, pivotIndex));
    QuickSort(data.Slice(pivotIndex + 1));
}

private static int Partition<T>(Span<T> data) where T : IComparable<T>
{
    var pivot = data[data.Length - 1];
    int i = -1;
    
    for (int j = 0; j < data.Length - 1; j++)
    {
        if (data[j].CompareTo(pivot) <= 0)
        {
            i++;
            (data[i], data[j]) = (data[j], data[i]);
        }
    }
    
    (data[i + 1], data[data.Length - 1]) = (data[data.Length - 1], data[i + 1]);
    return i + 1;
}

The beauty of this implementation lies in its simplicity and efficiency. Each recursive call operates on a logical view of the original data without ever copying elements or allocating auxiliary structures. The result is an algorithm that retains the conceptual clarity of the classic recursive version but with performance comparable to optimized implementations in systems languages.

This pattern naturally extends to other algorithms such as merge sort, binary search on ranges, and windowing operations on time series data, paving the way for implementations that are both efficient and maintainable.

Performance Deep Dive

The Real Impact on Performance

The theoretical benefits of Span<T> translate into measurable and often dramatic gains in real-world performance. To understand the extent of these improvements, it’s essential to analyze not only execution speed but also the impact on memory and the garbage collector.

Let’s take a concrete example: processing a CSV file with 100,000 rows, each containing 10 fields. The traditional approach with String.Split() would create approximately 1 million temporary strings, each requiring heap allocation and subsequent garbage collection. With Span<T>, we can process the entire file without intermediate allocations, operating directly on the characters in the original buffer.

[MemoryDiagnoser]
[SimpleJob(RuntimeMoniker.Net80)]
public class CsvProcessingBenchmark
{
    private readonly string _csvData;
    
    public CsvProcessingBenchmark()
    {
        // Generate a CSV with 10,000 rows of test data
        var lines = Enumerable.Range(1, 10000)
            .Select(i => $"{i},Product{i},Category{i % 10},{100 + i},{DateTime.Now:yyyy-MM-dd}")
            .ToArray();
        _csvData = string.Join("\n", lines);
    }
    
    [Benchmark(Baseline = true)]
    public int ProcessTraditional()
    {
        var lines = _csvData.Split('\n');
        int totalFields = 0;
        
        foreach (var line in lines)
        {
            var fields = line.Split(',');
            totalFields += fields.Length;
            
            // Simulates processing of each field
            foreach (var field in fields)
            {
                _ = field.Length;
            }
        }
        return totalFields;
    }
    
    [Benchmark]
    public int ProcessWithSpan()
    {
        var dataSpan = _csvData.AsSpan();
        int totalFields = 0;
        int lineStart = 0;
        
        for (int i = 0; i <= dataSpan.Length; i++)
        {
            if (i == dataSpan.Length || dataSpan[i] == '\n')
            {
                var line = dataSpan.Slice(lineStart, i - lineStart);
                totalFields += ProcessLineSpan(line);
                lineStart = i + 1;
            }
        }
        return totalFields;
    }
    
    private int ProcessLineSpan(ReadOnlySpan<char> line)
    {
        int fieldCount = 0;
        int fieldStart = 0;
        
        for (int i = 0; i <= line.Length; i++)
        {
            if (i == line.Length || line[i] == ',')
            {
                var field = line.Slice(fieldStart, i - fieldStart);
                _ = field.Length; // Simulate processing
                fieldCount++;
                fieldStart = i + 1;
            }
        }
        return fieldCount;
    }
}

Typical results from this benchmark show not only a significant improvement in speed (often 3–5x faster), but more importantly a drastic reduction in allocations — from hundreds of megabytes to zero allocations for the entire operation.

Garbage Collection: The Hidden Cost of Allocations

The most significant impact of Span<T> isn’t always visible in microbenchmarks, but it emerges in long-running applications where reducing pressure on the garbage collector translates into more predictable latencies and sustained throughput over time.

The .NET garbage collector, while efficient, has a cost that increases nonlinearly with the number of objects allocated. Each allocation requires tracking metadata, and each collection cycle must visit all reachable objects. In applications that process large volumes of data, reducing temporary allocations can have transformative effects on overall performance.

public class GCPressureAnalysis
{
    public void AnalyzeGCImpact()
    {
        const int iterations = 100000;
        const string testData = "apple,banana,cherry,date,elderberry,fig,grape";
        
        // Measuring the traditional approach
        GC.Collect();
        GC.WaitForPendingFinalizers();
        GC.Collect();
        
        var gen0Before = GC.CollectionCount(0);
        var gen1Before = GC.CollectionCount(1);
        var memoryBefore = GC.GetTotalMemory(false);
        var stopwatch = Stopwatch.StartNew();
        
        for (int i = 0; i < iterations; i++)
        {
            var parts = testData.Split(',');
            var processed = string.Join("|", parts);
        }
        
        stopwatch.Stop();
        var traditionalTime = stopwatch.ElapsedMilliseconds;
        var gen0After = GC.CollectionCount(0);
        var gen1After = GC.CollectionCount(1);
        var memoryAfter = GC.GetTotalMemory(true);
        
        Console.WriteLine($"Traditional: {traditionalTime}ms, " +
                         $"Gen0: {gen0After - gen0Before}, " +
                         $"Gen1: {gen1After - gen1Before}, " +
                         $"Memory: {memoryAfter - memoryBefore:N0} bytes");
        
        // Measuring with Span<T>
        GC.Collect();
        GC.WaitForPendingFinalizers();
        GC.Collect();
        
        gen0Before = GC.CollectionCount(0);
        gen1Before = GC.CollectionCount(1);
        memoryBefore = GC.GetTotalMemory(false);
        
        var buffer = new char[1000];
        stopwatch.Restart();
        
        for (int i = 0; i < iterations; i++)
        {
            ProcessWithSpan(testData.AsSpan(), buffer.AsSpan());
        }
        
        stopwatch.Stop();
        var spanTime = stopwatch.ElapsedMilliseconds;
        gen0After = GC.CollectionCount(0);
        gen1After = GC.CollectionCount(1);
        memoryAfter = GC.GetTotalMemory(true);
        
        Console.WriteLine($"Span: {spanTime}ms, " +
                         $"Gen0: {gen0After - gen0Before}, " +
                         $"Gen1: {gen1After - gen1Before}, " +
                         $"Memory: {memoryAfter - memoryBefore:N0} bytes");
    }
}

In real-world testing, it is common to see 90%+ reductions in garbage collection, resulting in improved latency tails and greater performance predictability.

CPU Cache Efficiency and Memory Access Patterns

An often overlooked aspect of modern performance is the efficiency of CPU cache access. Contemporary processors have complex cache hierarchies (L1, L2, L3) and sophisticated prefetching systems that work optimally with sequential and predictable access patterns.

Span<T> excels in this respect because it ensures that data is always physically contiguous in memory. When we iterate through a span, we generate a perfectly sequential access pattern that maximizes the effectiveness of the cache and hardware prefetching.

Contrast this with fragmented data structures like jagged arrays or object lists, where any access can potentially cause a cache miss. The performance difference can be order of magnitude, especially when processing large datasets.

[Benchmark]
public long SumJaggedArray()
{
    // Fragmented data - many cache misses
    var jaggedArray = CreateJaggedArray(1000, 1000);
    
    long sum = 0;
    for (int i = 0; i < jaggedArray.Length; i++)
    {
        for (int j = 0; j < jaggedArray[i].Length; j++)
        {
            sum += jaggedArray[i][j];
        }
    }
    return sum;
}

[Benchmark]
public long SumContiguousSpan()
{
    // Contiguous data - cache-friendly access
    var array = new int[1_000_000];
    for (int i = 0; i < array.Length; i++)
        array[i] = i;
    
    var span = array.AsSpan();
    long sum = 0;
    
    for (int i = 0; i < span.Length; i++)
    {
        sum += span[i];
    }
    return sum;
}

The contiguous span version often outperforms the jagged array version by 2–3x, purely due to improved data locality and cache efficiency.

Final Considerations

Span<T> andMemory<T> represent much more than simple optimizations — they are tools that allow you to write .NET code with performance comparable to systems languages, while maintaining the security and expressiveness that characterize the platform.

Adopting them requires a shift in mindset: we must start thinking in terms of views of data rather than copies of data, and consider memory locality and contiguity as primary design factors. When we master these concepts, we open the door to entirely new programming patterns that were previously only possible in low-level languages.

The next natural step is to explore how these types integrate into the broader .NET ecosystem — from System.Text.Json, which uses them for zero-allocation parsing, to ASP.NET Core, which leverages them to handle request/response efficiently, to Entity Framework, which incorporates them for more performant result queries. But that’s a topic for the next article in the series.

Thanks for reading.
If you found this article helpful, follow me for the next ones or just clap.

Span<T> and Memory<T> in C#: Complete Guide to Zero-Allocation Performance in .NET was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

This content originally appeared on Level Up Coding – Medium and was authored by Emanuele Longo