When Redis Isn't Enough: Building a Local Cache

October 21, 2025

Concurrency

Optimization

In high-performance systems, Redis is often the go-to choice for caching. But under heavy load, even the fastest remote cache can be a double-edged sword. Network latency, hot keys, and dependency overheads start to show. In such moments, a well-designed local cache can bring data closer and keep performance steady.

When Redis Brings Trouble

Redis is known for its speed, simplicity, and ability to handle large-scale caching. But in production, even Redis can expose hidden risks at scale. The moment traffic spikes, hot keys start dominating access patterns. Every request that misses the cache hits the same Redis node, amplifying latency and increasing the risk of cascading failures.

High traffic to a hot key causes all service nodes to direct requests to a single Redis node, leading to performance bottlenecks.

Beyond performance, reliability becomes another concern. A network hiccup, connection timeout, or cluster failover can suddenly stall services that rely heavily on Redis. The system's overall resilience now depends on the stability of an external dependency. In tightly coupled environments, that single point of failure can ripple through the entire architecture.

Why Local Beats Remote When It Matters Most

Local caching shines when milliseconds matter. By keeping data within the same process memory, it eliminates the network hop entirely—no serialization, no TCP overhead, no round trips to an external store. The result is near-instant lookups and predictable latency, even under heavy load.

Unlike Redis, a local cache lives and dies with the application. This tight coupling might sound like a drawback, but in reality, it's what gives local caching its power—it removes the dependency chain. When a service can still serve critical data during temporary outages or Redis slowdowns, user experience stays intact.

Local caching also reduces pressure on shared infrastructure. With fewer round trips to Redis, the cluster handles less load, leading to improved stability for other components that still rely on it. In distributed systems, this hybrid approach—local-first, remote-fallback—often provides the perfect balance between speed and consistency.

What Really Happens Inside

At its core, a local cache is nothing magical—it's just a thin layer of logic built around a map. But what makes it powerful lies in how it manages concurrency, expiration, and memory efficiency.

The foundation starts simple, a generic Cache interface defining the essential behaviors—Set and Get. Each key-value pair is wrapped in an entry struct that also stores a TTL (time-to-live) timestamp. When a key is fetched, the cache checks whether it has expired. If it has, the entry is removed immediately, ensuring stale data never lingers.

type Cache[K comparable, V any] interface {
	Set(k K, v V, ttl time.Duration)
	Get(k K) (V, bool)
}
 
type entry[V any] struct {
	value V
	ttl   int64
}

Behind the scenes, everything runs on top of a Go map protected by an RWMutex. The read-write lock allows multiple concurrent reads while still keeping writes safe and isolated. This balance ensures high throughput while maintaining data integrity.

type cacheImpl[K comparable, V any] struct {
	data map[K]entry[V]
	mu   sync.RWMutex
}

Whenever data is retrieved, the cache first checks the TTL before returning the value. If the entry has expired, it's immediately evicted from memory—a lightweight cleanup that keeps the cache healthy without a separate maintenance process.

func (c *cacheImpl[K, V]) Get(k K) (V, bool) {
	c.mu.RLock()
	e, ok := c.data[k]
	c.mu.RUnlock()
 
	if !ok {
		var zero V
		return zero, false
	}
	if e.ttl > 0 && e.ttl <= time.Now().UnixMilli() {
		c.mu.Lock()
		delete(c.data, k)
		c.mu.Unlock()
		var zero V
		return zero, false
	}
 
	return e.value, true
}

On writes, the cache locks the map exclusively, stores the value, and records its expiration time if a TTL is set. This ensures predictable, thread-safe updates without race conditions.

func (c *cacheImpl[K, V]) Set(k K, v V, ttl time.Duration) {
	c.mu.Lock()
	defer c.mu.Unlock()
 
	var expireAt int64
	if ttl > 0 {
		expireAt = time.Now().Add(ttl).UnixMilli()
	}
 
	c.data[k] = entry[V]{value: v, ttl: expireAt}
}

This simple pattern forms the backbone of most in-memory caches—fast lookups, safe concurrent access, and efficient self-cleanup. The next challenge? Managing memory growth without letting the cache live forever.

Keeping Data Alive, but Not Forever

A cache without limits is a memory leak waiting to happen. While TTL handles expiration based on time, it doesn't prevent memory from growing endlessly when data keeps flowing in. This is where eviction policies come in—rules that decide what to remove when space runs out.

The most common strategy is Least Recently Used (LRU). The idea is simple, keep what's fresh, discard what's forgotten. Each time an item is accessed, it's marked as “recently used.” When the cache reaches its limit, the least recently touched item gets evicted.

LRU cache maps keys to values in a linked list, most recently accessed items move to the front and the oldest item at the back is evicted first.

An LRU cache can be efficiently built using a combination of a map and a doubly linked list. The map provides constant-time access to nodes by key, while the linked list maintains the order of usage—the most recently used items near the front and the least recently used near the back. This hybrid approach allows the cache to perform lookups and evictions in O(1) time while keeping memory usage predictable.

Here's a simplified structure:

type node[K comparable, V any] struct {
	key   K
	value V
	prev  *node[K, V]
	next  *node[K, V]
}
 
type LRU[K comparable, V any] struct {
	capacity int
	items    map[K]*node[K, V]
	head     *node[K, V]
	tail     *node[K, V]
	mu       sync.Mutex
}

When reading, the cache moves the accessed item to the front of the list, marking it as the most recently used:

func (l *LRU[K, V]) Get(key K) (V, bool) {
	l.mu.Lock()
	defer l.mu.Unlock()
 
	if n, ok := l.items[key]; ok {
		l.moveToFront(n)
		return n.value, true
	}
 
	var zero V
	return zero, false
}

For writing, the cache first checks whether the key already exists. If it does, the value is updated and the node is moved to the front. If it doesn't and the capacity is full, the tail node—the least recently used—is evicted. Otherwise, a new node is added to the front.

func (l *LRU[K, V]) Set(key K, value V) {
	l.mu.Lock()
	defer l.mu.Unlock()
 
	if n, ok := l.items[key]; ok {
		n.value = value
		l.moveToFront(n)
		return
	}
 
	n := &node[K, V]{key: key, value: value}
	l.items[key] = n
	l.addToFront(n)
 
	if len(l.items) > l.capacity {
		l.removeOldest()
	}
}

This design keeps operations fast and memory predictable. Lookups and updates remain constant-time, while eviction and ordering are just as efficient. By combining TTL expiration and LRU eviction, the cache stays lean—holding only relevant data for as long as it's useful. It's a careful balance between memory efficiency, speed, and resilience.

Conclusion

Local caching isn't about replacing Redis—it's actually about complementing it. By keeping data close to where it's used, applications gain predictable performance, reduced latency, and greater resilience against external failures. The combination of a simple map, a doubly linked list, and well-placed synchronization provides a powerful yet lightweight caching layer that can handle most real-world scenarios.

However, performance always comes with trade-offs. Using a mutex ensures safety but may limit concurrency. Adding TTL and LRU improves memory efficiency but adds complexity. There's no universal formula—every cache design should reflect the system's actual workload and reliability needs.

In the end, a good cache isn't just fast; it's stable under pressure, efficient with memory, and invisible when it works right.