Token Bucket Algorithm Explained

why rate limiting matters

every API or system has limits. without some control, a single user could flood your server with requests and take it down. rate limiting solves this by controlling how many requests pass through over time.

there are many algorithms for this—fixed window, sliding window, leaky bucket, and token bucket. among them, token bucket is one of the most popular because it balances steady traffic control with short bursts of requests.

what is the token bucket algorithm?

imagine a bucket that holds tokens:

  • tokens are added at a fixed rate (e.g. 5 per second).
  • each request must “spend” one token to pass.
  • if tokens are available → request is allowed.
  • if the bucket is empty → request is rejected (or delayed).
  • the bucket has a maximum size, so tokens don’t grow forever.

this allows short bursts (if tokens have been saved up) while still enforcing an average rate over time.

diagram of token bucket

here’s a simple flow of how it works:

and a refill view:

the math behind it

let’s define:

  • C = bucket capacity
  • R = refill rate (tokens per second)
  • T = elapsed time since last refill

then the number of tokens at any moment is:

tokens = min(C, tokens + R * T)

when a request comes in:

  • if tokens > 0 → allow and tokens -= 1
  • else → reject

example run

  • capacity = 10
  • refill rate = 1 token/sec

timeline:

  • time 0s → bucket full (10 tokens)
  • user sends 5 requests instantly → 5 tokens left
  • time 5s later → 5 tokens refilled, bucket = 10 again
  • user sends 15 requests → only 10 succeed, 5 rejected

this shows how token bucket allows bursts up to capacity, but keeps the long-term rate limited.

pros and cons

pros

  • allows bursts but enforces average rate
  • smooth traffic shaping
  • widely used in networking & APIs

cons

  • slightly more complex than fixed window counter
  • needs careful sync in distributed systems

code example (javascript)

class TokenBucket {
  constructor(capacity, refillRate) {
    this.capacity = capacity;       // max tokens
    this.tokens = capacity;         // current tokens
    this.refillRate = refillRate;   // tokens per second
    this.lastRefill = Date.now();
  }

  refill() {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    const added = Math.floor(elapsed * this.refillRate);
    if (added > 0) {
      this.tokens = Math.min(this.capacity, this.tokens + added);
      this.lastRefill = now;
    }
  }

  allowRequest() {
    this.refill();
    if (this.tokens > 0) {
      this.tokens -= 1;
      return true;
    }
    return false;
  }
}

// usage
const bucket = new TokenBucket(10, 1); // 10 tokens, refill 1/sec

setInterval(() => {
  console.log("request allowed?", bucket.allowRequest());
}, 200);

real-world use cases

  • API gateways (AWS, Nginx, Envoy use variations of this)
  • network routers (token bucket for traffic shaping)
  • message queues (ensuring consumers don’t get overloaded)

comparing with other methods

  • Fixed Window Counter → simpler, but can allow unfair bursts at the edges.
  • Leaky Bucket → enforces a steady rate, but less flexible with bursts.
  • Token Bucket → combines steady rate with short bursts → best balance for APIs.

conclusion

the token bucket algorithm is one of the most practical ways to implement rate limiting. it’s simple enough to code, supports burst traffic, and ensures fair usage over time.

if you’re building an API or distributed system, token bucket should be one of your go-to options.

Similar Posts