Cache



This content originally appeared on DEV Community and was authored by Igor Grieder

Caching is a well know abstraction in programming, we see it every where, from our CPU, to keep data in a memory storage faster than the RAM, to CDNs, for delivering content faster to the user. It’s usually used as an optimization factor in systems that need a slower latency in read operations. Thus, understading what it is and how to implement it in applications is a good knowledge to have overall.

Caching in Web Development

In Web Development, the Cache is a layer that usually sits between our persistent data storage and the application. Its purpose is to make access to frequently used data cheaper/quicker by having these information in the cache available for read operations. In this article we’ll cover read and write implementations for caching in our backend, but first I’ll explain some important topics.

Consistency

Consistency between the cache and the database layers is something we need to always be aware of. We can try to achieve it relying on a short TTL (Time-to-Live) or strategies to invalidate the cache on database updates, to avoid manipulating stale data. Overall, it’s important to decide how consistency is going to be reached between these components of the system and we’ll cover some strategies on that.

Invalidation

To ensure consistency we need to always once our database layer is updated to invalidate our cache keys that are affected by the operation. We can either choose to delete or to update in the cache layer to avoid having stale data. This decision can depend on which approach of caching it’s being used.

Cache hit and Cache miss

Those are two terms I’ll be using quite a lot in this article, cache miss and cache hit and what they’re is pretty self-explanatory I would say:

  • Cache hit: we say he have a cache hit if when we look into our cache and we can find the data we would like to access, so we call it a hit!
  • Cache miss: in the other hand if we cannot find the data in the cache we have a miss.

Cache-Aside pattern

The Cache-Aside pattern, known as the lazy loading as well, is a simple implementation and usually is the most common approach for caching and reading. The idea of this method is to always have in cache the last written data in our database and repopulate the cache on every first cache miss.

Overall, this pattern will provides slower first reads, since it will need to fetch in the database and than populate the cache, but subsequent reads will be way faster. Given that, when we change something in our database, we need to invalidate cache keys that possibly use our data.

Cache aside pattern

package handlers

import (
    "context"
    "database/sql"
    "encoding/json"
    "log"
    "net/http"
    "time"

    "github.com/redis/go-redis/v9"
)

type Product struct {
    ID    string  `json:"id"`
    Name  string  `json:"name"`
    Price float64 `json:"price"`
}

type handler struct {
    redis *redis.Client
    pg    *sql.DB
}

func NewHandler(redis *redis.Client, pg *sql.DB) *handler {
    return &handler{redis: redis, pg: pg}
}

func (h *handler) GetProductCacheAside(w http.ResponseWriter, r *http.Request) {
    // Assume we get the product ID from the query params
    id := r.URL.Query().Get("id")
    if id == "" {
        http.Error(w, "Product ID is required", http.StatusBadRequest)
        return
    }

    // Look for data in the cache
    cacheKey := "product:" + id
    cachedProduct, err := h.redis.Get(context.Background(), cacheKey).Bytes()

    // Cache hit
    if err == nil {
        log.Println("Cache HIT")
        w.Header().Set("Content-Type", "application/json")
        w.Write(cachedProduct)
        return
    }

    // Cache Miss
    log.Println("Cache MISS")
    var product Product

    // Query the database
    err = h.pg.QueryRow("SELECT id, name, price FROM products WHERE id = $1", id).Scan(&product.ID, &product.Name, &product.Price)
    if err != nil {
        http.Error(w, "Product not found", http.StatusNotFound)
        return
    }

    // Populate the cache for next time
  productJSON, err := json.Marshal(product)
    if err != nil {
        // In a real prod environment a more persistent way of this double write should be enforced
        // if the cache write fails after the DB write succeeds.

        http.Error(w, "Could not process product data", http.StatusInternalServerError)
        return
    }

    err = h.redis.Set(context.Background(), cacheKey, productJSON, 5*time.Minute).Err()
    if err != nil {
        // In a real prod environment a more robust way is the right choice
        // if the cache write fails after the DB write succeeds.

        log.Printf("CRITICAL: DB updated but cache failed: %v", err)
        http.Error(w, "Failed to update cache", http.StatusInternalServerError)
        return
    }

    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(product)
}

Write-Through pattern

The Write-Through pattern, is a way for dealing with cache writes and the idea is to maintain a strong consistency between the data storages, improving the number of cache hits. This option is usually chosen when we have an environment that needs consistent low latency reads.

However, the Write-Through expects a more robust system to implement those dual write situations. Given that, it is recommended to have a strong fault tolerance approach during the dual inserts, since we need to guarantee that one write will just be successful if it’s done correctly in both storages.

Both write strategies I’ll discuss can be used alongside the Cache-Aside pattern and usually they are.

Write-Through

package handlers

import (
    "context"
    "database/sql"
    "encoding/json"
    "log"
    "net/http"
    "time"

    "github.com/redis/go-redis/v9"
)

type Product struct {
    ID    string  `json:"id"`
    Name  string  `json:"name"`
    Price float64 `json:"price"`
}

type handler struct {
    redis *redis.Client
    pg    *sql.DB
}

func NewHandler(redis *redis.Client, pg *sql.DB) *handler {
    return &handler{redis: redis, pg: pg}
}

func (h *handler) UpdateProductWriteThrough(w http.ResponseWriter, r *http.Request) {
    var product Product
    if err := json.NewDecoder(r.Body).Decode(&product); err != nil {
        http.Error(w, "Invalid request body", http.StatusBadRequest)
        return
    }
    cacheKey := "product:" + product.ID

    // Write to the database first
    _, err := h.pg.Exec("UPDATE products SET name = $1, price = $2 WHERE id = $3", product.Name, product.Price, product.ID)
    if err != nil {
        http.Error(w, "Failed to update database", http.StatusInternalServerError)
        return
    }

    // Write the same data to the cache
    productJSON, err := json.Marshal(product)
    if err != nil {
        // In a real prod environment a more persistent way of this double write should be enforced
        // if the cache write fails after the DB write succeeds.

        http.Error(w, "Could not process product data", http.StatusInternalServerError)
        return
    }

    err = h.redis.Set(context.Background(), cacheKey, productJSON, 5*time.Minute).Err()
    if err != nil {
        // In a real prod environment a more persistent way of this double write should be enforced
        // if the cache write fails after the DB write succeeds.

        log.Printf("CRITICAL: DB updated but cache failed: %v", err)
        http.Error(w, "Failed to update cache", http.StatusInternalServerError)
        return
    }

    log.Println("Write-Through successful")
    w.WriteHeader(http.StatusOK)
}

Write-Behind pattern

The Write-Behind or Lazy Write pattern goal is to reduce the latency increased in write operations when we need to update state in two different storages. The strategy is simple and consists of writing only to the cache and confirming the operation immediately. The database is then updated asynchronously in the background.

This makes write operations feel instantaneous to the user. However, this speed comes with a trade-off between immediate consistency for performance, resulting in eventual consistency.

Write-Behind

package handlers

import (
    "context"
    "database/sql"
    "encoding/json"
    "log"
    "net/http"
    "time"

    "github.com/redis/go-redis/v9"
)

type Product struct {
    ID    string  `json:"id"`
    Name  string  `json:"name"`
    Price float64 `json:"price"`
}

type handler struct {
    redis *redis.Client
    pg    *sql.DB
}

func NewHandler(redis *redis.Client, pg *sql.DB) *handler {
    return &handler{redis: redis, pg: pg}
}

func (h *handler) UpdateProductWriteBehind(w http.ResponseWriter, r *http.Request) {
    var product Product
    if err := json.NewDecoder(r.Body).Decode(&product); err != nil {
        http.Error(w, "Invalid request body", http.StatusBadRequest)
        return
    }
    cacheKey := "product:" + product.ID

    // Write data to the cache immediately
    productJSON, _ := json.Marshal(product)
    err := h.redis.Set(context.Background(), cacheKey, productJSON, 5*time.Minute).Err()
    if err != nil {
        http.Error(w, "Failed to write to cache", http.StatusInternalServerError)
        return
    }

  err := h.redis.Set(context.Background(), cacheKey, productJSON, 5*time.Minute).Err()
    if err != nil {
        http.Error(w, "Failed to write to cache", http.StatusInternalServerError)
        return
    }

    // Return success to the user immediately
    log.Println("Write-Behind: Cache updated. Acknowledged to client.")
    w.WriteHeader(http.StatusAccepted) // 202 Accepted is a good status code here

    // Asynchronously write to the database after a delay
    go func() {
        // In a real prod application this need to be a more robust queueing system
        // instead of just a simple timed goroutine.
        time.Sleep(10 * time.Second)

        _, err := h.pg.Exec("UPDATE products SET name = $1, price = $2 WHERE id = $3", product.Name, product.Price, product.ID)
        if err != nil {
            log.Printf("ERROR: Failed write-behind to database: %v", err)
            // Need an error handling/retry mechanism here to ensure consistency
        } else {
            log.Println("Write-Behind: Database updated successfully.")
        }
    }()
}

Conclusion

To sum up, it’s a good practice to use the Cache-Aside pattern to reduce latency overall in reads. You then choose a write strategy to proactively keep the cache data fresh when your database changes. A fresher, more consistent cache naturally leads to more occurrences of cache hits for your Cache-Aside pattern.

Always remember that the database is your system’s true source of truth. While the cache is fast, it’s also volatile since it’s an in-memory storage and data can be lost. The database provides durability, ensuring your data is safely written to disk.

If you liked this article please leave a like and a comment!


This content originally appeared on DEV Community and was authored by Igor Grieder