python 38 lines · 6 steps

How a thread-safe token bucket rate limiter works

A token bucket meters request rates by refilling tokens over time and only proceeding when enough are available.

Explained by highlit
1import time
2import threading
3 
4 
5class TokenBucket:
6 def __init__(self, rate: float, capacity: int):
7 self.rate = rate
8 self.capacity = capacity
9 self._tokens = float(capacity)
10 self._last = time.monotonic()
11 self._lock = threading.Lock()
12 
13 def _refill(self) -> None:
14 now = time.monotonic()
15 elapsed = now - self._last
16 self._tokens = min(self.capacity, self._tokens + elapsed * self.rate)
17 self._last = now
18 
19 def acquire(self, tokens: int = 1) -> None:
20 if tokens > self.capacity:
21 raise ValueError("requested tokens exceed bucket capacity")
22 while True:
23 with self._lock:
24 self._refill()
25 if self._tokens >= tokens:
26 self._tokens -= tokens
27 return
28 deficit = tokens - self._tokens
29 wait = deficit / self.rate
30 time.sleep(wait)
31 
32 def try_acquire(self, tokens: int = 1) -> bool:
33 with self._lock:
34 self._refill()
35 if self._tokens >= tokens:
36 self._tokens -= tokens
37 return True
38 return False
01 / 01
STEP 01

Walkthrough

Space play step click any line
Three takeaways
  1. 1Tracking elapsed time lets you compute refills lazily instead of running a background timer.
  2. 2A lock around the read-modify-write of token state keeps concurrent callers correct.
  3. 3Offering both blocking and non-blocking acquire methods covers backpressure and fail-fast use cases.

Related explainers

Share this explainer

Here's the card — post it anywhere.

How a thread-safe token bucket rate limiter works — share card
Made with highlit — turn any snippet into a walkthrough like this in about a minute.
Explain your code