#11

Distributed Rate Limiter

January 30, 2026

RustRedisGCPTerraformToken Bucket

High-performance distributed rate limiter built in Rust + Redis using Token Bucket algorithm. Millisecond latency, millions of checks per second. Deployed on GCP with Terraform + Memorystore.

What is it?

I built this as a distributed rate limiter in Rust backed by Redis. The goal was to make something simple, fast, and realistic enough to sit in front of an API without becoming the bottleneck itself.

I wanted it to work across multiple app instances, keep latency extremely low, and use an algorithm that matched how real client traffic behaves. That pushed me toward the token bucket model and toward Redis as the shared state layer.

How it works

The request flow is intentionally small. A client sends an identifier to the `/check` endpoint. The service looks up that client’s bucket state in Redis, computes how many tokens should exist after refill, decides whether the request is allowed, updates the bucket, and returns allow or deny.

What matters is that this logic stays cheap enough to run on every single API request without becoming noticeable. That is why the whole design is built around a tiny state record and a very short hot path.

Token bucket vs the alternatives

I chose token bucket because it matches real traffic better than stricter algorithms. APIs often need to allow short bursts while still enforcing an average limit over time. Token bucket does exactly that by letting the bucket fill up to a max capacity and then draining tokens as requests come in.

A fixed window is easier to implement, but it has awkward edge cases where clients can burst at the boundary. Sliding windows are more precise but often more expensive. Leaky bucket is smoother but less flexible for bursty traffic. For general API protection, token bucket felt like the best tradeoff.

Redis atomicity with Lua scripts

The tricky part is concurrency. If two app instances read the same bucket at the same time, they can both think there is one token left and both allow a request. That race breaks the limit.

So I moved the whole read-compute-write sequence into a Redis Lua script. Redis runs that script atomically, which means no other client can interrupt the update halfway through. That turns the rate-limit decision into one safe server-side operation instead of multiple fragile client-side steps.

Why Rust?

I picked Rust because rate limiting sits on the hot path. I wanted low overhead, predictable latency, and good concurrency without a garbage collector adding pause behavior. Tokio made it straightforward to handle a lot of concurrent requests while keeping the app itself very lean.

In practice, the app was not the bottleneck anyway. The network round-trip to Redis dominates. But that is actually a good outcome. It means the service layer is cheap enough that the real constraint is the shared data store, not my application runtime.

Key takeaways

Token bucket algorithm: burst capacity, refill rate, when to prefer it over alternatives
Redis Lua scripts: atomicity, why TOCTOU races happen and how Lua prevents them
GCP Memorystore: managed Redis with VPC peering, automatic failover
Rust + Tokio: async HTTP serving with zero GC pauses
Terraform for GCP: provisioning Memorystore, Cloud Run, and VPC networking together

← all projects