Angga Putra

Software Engineer

Taming load with worker pool

During college, while shipping products at RISTEK, I was handed over a project that was already running in production.

The service looked fine at first. I checked the code, ran it, and nothing seemed obviously broken. But there was one note from the previous PMs that was concerning. The service kept crashing whenever it had to send notifications in bulk.

That kind of issue is tricky. It does not fail all the time. It only shows up when the load is high enough. Just because an app runs fine in normal conditions, it does not mean it is safe in real usage.

Finding the real issue

When I started reading the code more carefully, I found the problem in the notification flow. There was a goroutine inside a loop, and it was spawned without any limit.

for _, user := range users {
	go sendNotification(user)
}

At a first, it looked fast. Every request could run in parallel. But, that was exactly the problem.

With great power comes great responsibility

Uncle Ben, Spider Man

Sending notifications to a large number of users means sending a large number of HTTP requests. If every request gets its own goroutine without control, the service starts doing too much work at the same time. More requests means more CPU usage, more scheduling overhead, and more pressure on the instance. That was why the app died during bulk send.

There was another issue too. Those goroutines did not wait for each other. The process just fired them and moved on.

for _, user := range users {
	go sendNotification(user)
}
 
return nil

So I spent more time rechecking the flow and thinking about what should actually happen. I did not need unlimited concurrency. I needed controlled concurrency.

Reworking it with worker pool

That was when I decided to rewrite the flow using a worker pool pattern. Instead of creating a goroutine for every user, I created a fixed number of workers and let them consume jobs from a channel.

jobs := make(chan User)
 
for i := 0; i < workerCount; i++ {
	go worker(jobs)
}

Then I added a WaitGroup so the main process would wait until all workers finished their jobs.

var wg sync.WaitGroup
 
for i := 0; i < workerCount; i++ {
	wg.Add(1)
	go func() {
		defer wg.Done()
		worker(jobs)
	}()
}

The sending flow became much more predictable.

for _, user := range users {
	jobs <- user
}
 
close(jobs)
wg.Wait()

After that change, the service worked properly during bulk notification sending. The app no longer spawned work without limit, and the instance could handle the load in a more stable way.

Why it mattered

At first, the code looked fine because it ran without obvious errors. The real problem only showed up when the load got high and the service had to do too much work at once.

More goroutines looked faster, but in practice they pushed the instance too hard. After the flow was limited with a worker pool, the service became much more predictable and stable.