Software Engineer
The first quarter of 2026 was not a quarter where anybody got to ignore cost.
One of the services my team owned kept standing out in the Kubernetes dashboards. Across the running pods, memory usage was sitting around 6 GB. That might have been fine if traffic had been consistently high, but it was not.
I checked the last 90 days of request volume before touching the code. There were spikes, sure, but they came and went. Nothing in that pattern explained why memory kept behaving like the service was under sustained pressure.
The part that felt wrong
I could have ignored one noisy day on the chart. What kept bothering me was that the service had started to look expensive in a steady way, not just during traffic bursts.
Before the fix, the pods usually floated around 30 to 40 percent memory usage, with random jumps well above that. After the change, the same service settled much closer to ~10 percent and stayed there.
That gap was too large to explain away with normal variance. Traffic was rising and falling, but the memory floor kept dragging upward and staying there. Something in the process was holding on to data longer than the request flow needed.
Looking for a way
The first few days did not look like debugging yet. I was mostly trying to figure out how people usually approached this kind of problem without wasting time chasing the wrong thing.
So I kept searching around first. Some of it was Google, some of it was YouTube, some of it was Medium articles from people who had already gone through this before. I was not looking for a magic fix yet. I just needed a more solid way to investigate memory problems in Go without guessing from charts alone.
After a while, the same idea kept showing up from different places. People were not treating memory leaks as something you reason out from instinct. They were profiling the process and checking what kept surviving in memory when the workload should already have finished.
That was what led me to Go's pprof.
Setting it up
I exposed the standard profiling endpoints and kept them available only inside our internal network.
import (
"net/http"
_ "net/http/pprof"
)From there I could start pulling heap profiles from the running process and compare them over time.
go tool pprof http://some-service-v4/debug/pprof/heapWhen I needed a better visual pass, I used the local web UI too.
go tool pprof -http=:8080 http://some-service-v4/debug/pprof/heapAt that stage I was not trying to jump straight to the fix. I just needed to understand what healthy memory behavior looked like for that service, otherwise every large allocation would look suspicious.
Watching the heap
The first heap snapshot was useful, but only in a limited way. It showed where memory existed. It did not tell me which parts of the flow kept making that baseline stick.
So I started treating it like a repeated observation instead of a one-time inspection. I took heap profiles while the service handled the same general shape of request traffic, then compared what kept surviving between runs.
(pprof) top
(pprof) list getItemsSome of the paths became familiar after a while. A few allocations inside longer getItems flows kept surviving longer than I expected, especially around retryable processing and payloads that were being carried forward for too long.
That explained the mismatch in the dashboard. The service did not need one dramatic incident to waste memory. Small pieces of retained state inside those paths were enough to keep raising the floor even after the busy part of the traffic had passed.
Tightening the flow
Once I had a better idea where to look, the work turned repetitive in a useful way.
I would make one adjustment, run the same flow again, pull another profile, and check whether the same retained objects were still there. Sometimes the change helped a little. Sometimes it changed nothing and I had to back up and inspect a different path.
That cycle kept going for about two weeks. I used pprof again and again, trimmed what the workers were keeping between retries, reduced how much data stayed attached to longer-lived processes, then verified the result with another heap profile before trusting it.
None of the fixes were especially dramatic by themselves. The important part was the repetition. Each small change had to prove that it actually lowered retained memory instead of only moving allocations somewhere else.
Eventually the graph started behaving like the traffic graph had suggested from the start. Spikes still showed up when traffic rose, but memory stopped establishing a higher baseline after the work was done.
After it settled
Average memory usage dropped from roughly 30 to 40 percent down to about 10 percent. That was enough to matter for the cost work we were doing, and it did not require trading away reliability to get there.
More than anything, this was the first time I had to stay patient with memory profiling. The answer was not obvious in the first few days, but the heap snapshots were still pointing in the right direction long before the final graph looked clean.