Rate Limiting – ██FR█████ █INTELL███████████

This content originally appeared on DEV Community and was authored by CLIFF OYOH OMOLLO

If you’ve ever built an API for public use, then it’s quite likely that you’ll want to implement some form of rate limiting to prevent clients from making too many requests too quickly, and putting excessive strain on your server.
In this guide I am going to demonstrate how to implement this by creating a middleware that will intercept requests coming into the server.
Essentially, we want this middleware to check how many requests have been received in the last ‘N’ seconds and — if there have been too many — then it should send the client a 429 Too Many Requests response. We’ll position this middleware before our main application handlers, so that it carries out this check before we do any expensive processing like decoding a JSON request body or querying our database.
What will become clear by the end of this guide is that you will understand with clarity the principles behind token-bucketrate-limiter algorithms and how we can apply them in the context of an API or web application.
You will also learn how to create middleware to rate-limit requests to your API endpoints, first by making a single rate global limiter, then extending it to support per-client limiting based on IP address.

Global Rate Limiting
This simply means creating one single token bucket for the entire application. Every single request from every single user spends a token from this shared bucket.
It’s incredibly easy to implement and provides a hard cap on your server’s total load, protecting it from a massive, distributed spike in traffic.
However that also means that one misbehaving client can exhaust all the tokens in the global bucket, effectively locking out every other legitimate user. This is often not the desired behavior.
We are not going to implement the rate limiting logic by ourselves from scratch but will instead leverage the x/time/rate package from golang external library.This package provides a tried-and-tested implementation of a token bucket rate limiter and we can therefore trust it to do its job perfectly well.

first we’ll need to download the latest version of this package into our code by running this on our terminal:

$ go get golang.org/x/time/rate@latest

How token-bucket logic works
According to its description from the official x/time/rate documentation:
A Limiter controls how frequently events are allowed to happen. It implements a “token
bucket” of size b , initially full and refilled at rate r tokens per second.
Putting that into the context of what we are trying to build translates to this:

We will have a bucket that starts with b tokens in it.
Each time we receive a HTTP request, we will remove one token from the bucket.
Every 1/r seconds, a token is added back to the bucket — up to a maximum of b total tokens.
If we receive a HTTP request and the bucket is empty, then we should return a 429 Too Many Requests response. In practice this means that our application would allow a maximum ‘burst’ of b HTTP requests in quick succession, but over time it would allow an average of r requests per second.

The function we use to create a token bucket rate limiter from x/time/rate is known as NewLimiter() function and has this signature:func NewLimiter(r Limit, b int) *Limiter
So if we want to create a rate limiter which allows an average of 2 requests per second, with
a maximum of 5 requests in a single ‘burst’, we could do so with the following code:

// Allow 2 requests per second, with a maximum of 4 requests in a burst.
limiter := rate.NewLimiter(2, 5)

Enforcing a global rate limit
We that high level description of how a potential rate limiter algorithm works, lets now dive into the code to see how it works practically.
One of the great things about the middleware pattern we’re using is that it allows us to define initialization logic that runs only once, at the moment we wrap a handler with the middleware — not on every request. This is possible because the middleware returns a closure that retains access to the variables in its surrounding scope, such as the rate limiter instance.

Here’s a simplified example of such middleware:

func (app *application) exampleMiddleware(next http.Handler) http.Handler {
// Any code here will run only once, when we wrap something with the middleware.
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// Any code here will run for every request that the middleware handles.
next.ServeHTTP(w, r)
})
}

In our case, we’ll create a new rateLimit()middleware function that initializes a rate limiter once during setup and then reuses it to control the rate of every incoming request that the middleware handles.
inside the function:

Initialize a new rate limiter which allows an average of 2 requests per second,with a maximum of 4 requests in a single ‘burst’.
The function we are returning is a closure, which ‘closes over’ the limiter variable.
We call the Allow() method to see if the request is permitted, if its not we return a 429 Too many Requests response.


func rateLimit(next http.Handler) http.Handler {

    limiter := rate.NewLimiter(2, 4)


    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        if !limiter.Allow() {

            http.Error(w, http.StatusText(http.StatusTooManyRequests), http.StatusTooManyRequests)
            return
        }
        next.ServeHTTP(w, r)
    })
}

In this code, whenever we call the Allow() method on the rate limiter exactly one token will
be consumed from the bucket. If there are no tokens left in the bucket, then Allow() will
return false and that acts as the trigger for us to send the client a 429 Too Many Requests
response.
Made for Concurrency
Internally, the limiter uses a mutex (mutual exclusion lock).
A mutex ensures that only one goroutine (thread) can access or modify the limiter’s internal data at a time.

This prevents race conditions, where multiple requests could otherwise access and modify the shared token bucket at the same time, leading to unpredictable behavior.

This makes the rate limiter thread-safe and reliable even when our server is handling thousands of simultaneous requests.

IP-based Rate Limiting
Using a global rate limiter can be useful when you want to enforce a strict limit on the total
rate of requests coming to your API, and you don’t care where the requests are coming from. But it’s generally more common to want an individual rate limiter for each client, so that one bad client making too many requests doesn’t affect all the others.
A conceptually straightforward way to implement this logic is to create an in-memory map of rate limiters, using the IP address for each client as the map key.
Each time a new client makes a request to our API, we will initialize a new rate limiter and add it to the map. For any subsequent requests, we will retrieve the client’s rate limiter from the map and check whether the request is permitted by calling its Allow() method, just like we did earlier on.
However, it’s important to note that maps are not safe for concurrent use. This is a big issue for our case because our rateLimit() middleware might be executed in multiple goroutines at a go- by default Go's http.Server handles each HTTP request in its own goroutine.
From Go blog:
Maps are not safe for concurrent use: it’s not defined what happens when you read and write to them simultaneously. If you need to read from and write to a map from concurrently executing goroutines, the accesses must be mediated by some kind of synchronization mechanism.

IP addresses
In a production environment, it’s common for requests to pass through proxy servers before reaching your application. Because of this, the IP address you get from r.RemoteAddr may not belong to the actual client — instead, it might be the address of the last proxy.

Fortunately, most proxies add headers like X-Forwarded-For or X-Real-IP to the request, which include the original client’s IP address. By checking these headers first, you can more accurately determine where the request came from.

Rather than manually writing logic to handle this, you can use a small Go package called realip. It automatically checks for those headers and, if they’re missing, falls back to r.RemoteAddr.

To install the package, run:

$ go get github.com/tomasen/realip@latest

Lets move on and update our rateLimit() middleware to implement this changes:

func rateLimitMiddleware(limit rate.Limit, burst int, exceededHandler http.HandlerFunc) func(http.Handler) http.Handler {
    // Shared state: mutex and a map to track rate limiters by //client IP.
    var (
        mu      sync.Mutex
        clients = make(map[string]*rate.Limiter)
    )

    return func(next http.Handler) http.Handler {
        return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
            // Retrieve the real client IP from headers //or fallback to RemoteAddr.
            ip := realip.FromRequest(r)

            // Safely access or create a rate limiter //for the IP address.
            mu.Lock()
            limiter, exists := clients[ip]
            if !exists {
                limiter = rate.NewLimiter(limit, burst)
                clients[ip] = limiter
            }
            // If not allowed, unlock and handle rate //limit exceeded.
            if !limiter.Allow() {
                mu.Unlock()
                http.Error(w, http.StatusText(http.StatusTooManyRequests), http.StatusTooManyRequests)
                return
            }
            mu.Unlock()

            // Proceed to the next handler in the chain.
            next.ServeHTTP(w, r)
        })
    }
}

Deleting old limiters
The code we wrote earlier works perfectly for rate limiting users based on their IP addresses. However, there’s a small issue we need to fix.

Right now, we’re using a map[string]*rate.Limiter to store each user’s rate limiter, based on their IP. But the problem is:

This map will grow endlessly as more and more users visit the site.

Every new user adds a new key to the clients map, and we never delete them. That means over time — especially on a high-traffic server — this could consume a lot of memory and eventually cause performance issues.

To make this work, we’ll need to create a custom client struct which holds both the rate limiter and last seen time for each client, and launch the background cleanup goroutine when initializing the middleware.

Let’s go:

func rateLimit(next http.Handler) http.Handler {
    // Define a struct to store information for each client, including
    // their rate limiter and the last time we saw a request from them.
    type client struct {
        limiter  *rate.Limiter
        lastSeen time.Time
    }

    var (
        // Mutex to safely access the clients map across goroutines.
        mu sync.Mutex

        // Map to store client IP addresses and their associated rate limiter data.
        clients = make(map[string]*client)
    )

    // Start a background goroutine to clean up clients we haven’t seen recently.
    go func() {
        for {
            // Wait for 1 minute between cleanup runs.
            time.Sleep(time.Minute)

            mu.Lock() // Lock while cleaning to avoid race conditions.

            for ip, c := range clients {
                // Remove clients not seen in the last 3 minutes.
                if time.Since(c.lastSeen) > 3*time.Minute {
                    delete(clients, ip)
                }
            }

            mu.Unlock() // Unlock after cleanup.
        }
    }()

    // Middleware handler function.
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        // Extract the real IP address of the client.
        ip := realip.FromRequest(r)

        mu.Lock()

        // If the client doesn't exist in our map, create a new rate limiter for them.
        if _, found := clients[ip]; !found {
            clients[ip] = &client{
                limiter: rate.NewLimiter(2, 4), // Allow 2 requests/sec with a burst of 4.
            }
        }

        // Update the lastSeen time to now.
        clients[ip].lastSeen = time.Now()

        // Check if the client is allowed to proceed.
        if !clients[ip].limiter.Allow() {
            mu.Unlock()
            // Respond with a 429 Too Many Requests error.
            http.Error(w, http.StatusText(http.StatusTooManyRequests), http.StatusTooManyRequests)
            return
        }

        mu.Unlock()

        // Call the next handler in the chain if allowed.
        next.ServeHTTP(w, r)
    })
}

Distributed Applications
In conclusion I would like to reiterate that this in-memory rate-limiting approach works well only if our API is running on a single machine. In a distributed setup — where your application runs on multiple servers behind a load balancer — this pattern can not be effective.

If you’re using HAProxy or Nginx as your load balancer or reverse proxy, consider using their built-in rate-limiting features. These are often the most straightforward solution.

Alternatively, for more complex setups, you can use a centralized, fast data store like Redis to track request counts. This allows all your application servers to coordinate rate-limiting through a shared backend.
THANK YOU AND HAPPY CODING!

This content originally appeared on DEV Community and was authored by CLIFF OYOH OMOLLO