Building a Redis Clone in Zig—Part 2 – ██FR█████ █INTELL███████████

This content originally appeared on DEV Community and was authored by Charles Fonseca

In the previous article, we covered the basics of building a Redis clone in Zig, focusing on the data structures and the core functionality. In this article, we will dive deeper into optimizing our in-memory store by implementing string interning and customizing our hash map for better performance.

Hash Table Context

If you noticed, this is how the OptimizedHashMap was defined in the previous article:

const OptimizedHashMap = std.ArrayHashMapUnmanaged([]const u8, ZedisObject, StringContext, true);

Let’s focus on the StringContext part. The default context for HashMaps uses the standard library’s std.hash.Wyhash, which is a great general-purpose hash function, and for equality, it uses std.mem.eql(u8), which is a byte-wise equality check. We can customize this by providing our own context struct.

Benchmarking

During development, I found that the default hash function and equality check were not optimal for our use case. Specifically, we want to optimize for short strings, as Redis often deals with short keys and values. I assumed SIMD (Single Instruction, Multiple Data) was the best way to speed up equality checks, and the answer was a huge it depends.

Equality Benchmark

The obvious choice for equality is to use SIMD instructions for faster comparisons. For which, again, Karl Seguin has a great article on the topic: Using SIMD to Tell if the High Bit is Set. We won’t be checking the high bit, but the same principles apply.

Our baseline will be the standard library’s std.mem.eql(u8) function. During the benchmark, I found that it’s pretty hard to beat the standard library, my original SIMD implementation was half the speed of std.mem.eql(u8) for almost all string lengths.

You can see the benchmark code and results in this GitHub Gist. Long story short, for strings shorter than 45 bytes, std.mem.eql(u8) is the fastest option. For strings between 45 and 128 bytes, a 16-byte SIMD using XOR is the fastest; it reaches an average of 850 million operations per second on my M4 MacBook Pro.

Here’s the final implementation of the SIMD equality function:

pub fn simdStringEql(a: []const u8, b: []const u8) bool {
    // Fast path
    if (a.len != b.len) return false;
    if (a.len == 0) return true;

    // Check for pointer equality first, this is a fast path for interned strings
    if (a.ptr == b.ptr) return true;

    // For strings < 45 bytes, std.mem.eql is faster due to:
    // - Highly optimized LLVM intrinsics
    // - Lower overhead for small comparisons
    // - Better branch prediction
    // Most Redis keys fall into this category
    if (a.len < 45) {
        return std.mem.eql(u8, a, b);
    }

    // For longer strings (≥45 bytes), use explicit SIMD vectorization
    // Fixed 16-byte vectors for consistent performance across platforms
    const vec_len = 16;
    const Vec = @Vector(vec_len, u8);
    var i: usize = 0;

    // Process 16-byte chunks with SIMD using XOR technique
    // XOR is slightly more efficient than equality comparison + reduce
    while (i + vec_len <= a.len) : (i += vec_len) {
        const va: Vec = a[i..][0..vec_len].*;
        const vb: Vec = b[i..][0..vec_len].*;
        const xor_result = va ^ vb;

        // If XOR result is all zeros, the vectors are equal
        // Check if any byte is non-zero (which means difference)
        if (@reduce(.Or, xor_result != @as(Vec, @splat(0)))) {
            return false;
        }
    }

    // Handle remaining bytes efficiently with std.mem.eql
    // This is faster than a scalar loop for the tail
    return std.mem.eql(u8, a[i..], b[i..]);
}

Hash Function Benchmark

I also wanted to benchmark different hash functions to see which one performs best for our use case. The standard library provides several hash functions, including Wyhash, CityHash64, Murmur3, and others. I added a few more to the benchmark, including Murmur2, XxHash (32 and 64), Adler32, and Blake3 (cryptographic). I implemented a simple FNV-1a hash function for comparison as well. You can see the benchmark code and results in this GitHub Gist. Long story short, CityHash64 outperforms Wyhash slightly, reaching an average of 490 million operations per second versus Wyhash’s 460 million on my laptop.

Intern Strings

The idea is to store only one copy of each unique string value, and all references to that string point to the same memory location. This is a common technique used in many programming languages and systems to optimize memory usage and speed up string comparisons. That’s why in the equality function above, we first check if the pointers are the same, if (a.ptr == b.ptr) return true;. If they are, we can immediately return true without further comparison, we get O(1) complexity for almost all string comparisons.

pub const Store = struct {
    base_allocator: std.mem.Allocator,
    allocator: std.mem.Allocator,
    map: OptimizedHashMap,

    ...

    // Map of interned strings
    interned_strings: std.StringHashMapUnmanaged([]const u8),

    fn internString(self: *Store, str: []const u8) ![]const u8 {
        assert(str.len > 0);
        const gop = try self.interned_strings.getOrPut(self.base_allocator, str);
        if (!gop.found_existing) {
            const owned_str = try self.dupeString(str);
            gop.key_ptr.* = owned_str;
            gop.value_ptr.* = owned_str;
        }
        return gop.value_ptr.*;
    }

    pub fn set(self: *Store, key: []const u8, value: []const u8) !void {
        assert(key.len > 0);
        // Check if we need to resize before adding new entries
        try self.maybeResize();

        const interned_key = try self.internString(key);
        const gop = try self.map.getOrPut(self.base_allocator, interned_key);

        // Free old value if key existed
        if (gop.found_existing) {
            switch (gop.value_ptr.value) {
                .string => |str| {
                    if (str.len > 0) self.smartFree(str);
                },
                .int => {},
                .short_string => {},
            }
        }
        // Key is already interned, no need to duplicate again
        gop.key_ptr.* = interned_key;

        // Build the new object with allocated values
        var new_object = object;
        switch (object.value) {
            .string => |str| {
                const value_copy = try self.dupeString(str);
                new_object.value = .{ .string = value_copy };
            },
            .int => {}, // No allocation needed
            .short_string => {}, // No allocation needed - stored inline
        }

        // Update the entry
        gop.value_ptr.* = new_object;
    }

By doing this, we ensure that all identical strings share the same memory, which saves space and speeds up comparisons. To summarize, the Store struct now includes an interned_strings map to keep track of all unique strings, we added the short_string variant to the ZedisObject union to store small strings inline, and we updated the hash map context to use our optimized hash function and equality check. This is a great foundation for building a high-performance in-memory cache.

Know More

Zedis GitHub Repository

This content originally appeared on DEV Community and was authored by Charles Fonseca