Analyzing Kubernetes Controllers Performance With Pprof
pprof is a Go standard library package, which provides tooling for collecting and analyzing profiling data from Go applications.
After a profile is collected from an application, it can be analyzed and visualized with the
go tool pprof command.
A common technique for collecting profiles from Go applications is to import the net/http/pprof
package, which will register endpoints on an existing HTTP server under the
/debug/pprof/ URL. It can then be used
to download live profiles from a running application.
pprof can be easily integrated into your Kubernetes controllers to help gain deeper understanding of how a controller
is behaving at runtime with little performance overhead.
What Is a Profile?
The Godoc for a Profile describes them as:
A Profile is a collection of stack traces showing the call sequences that led to instances of a particular event, such as allocation.
In other words, a profile is a set of stack traces collected from a running Go application with some additional metadata attached to each stack trace which provides insight into how the application is running. This additional data might include things like memory allocation information or CPU timing of function calls.
There are a set of predefined profiles which cover most profiling use cases (heap, cpu, etc); however, it is possible to write custom profiles if you have a specific use case that isn’t covered in the builtin profiles.
The predefined profiles are as follows:
Profiling Kubernetes Controllers
Now that you know a little bit about
pprof and profiling, we can look at why you might need this for Kubernetes controllers. Much like
any other application, Kubernetes controllers are prone to suffering from performance issues, running out of memory, etc.
If your controller is being
OOMKilled, instead of just simply increasing the memory limits and moving on, you can
actually understand what is using up all the memory by collecting and analyzing
Another example scenario where profiling might help is if a controller is suffering from performance issues when running
at scale; collecting a
cpu profile can help identify functions that are using the most CPU time.
pprof via Controller-Runtime
controller-runtime version v0.15.0, enabling the
pprof server can be accomplished by specifying the
option on the controller
manager. Prior to v0.15.0, it was possible to enable profiling but required manually adding
pprof endpoint to the existing metrics server via the
pprof server on your controller(s) is as simple as this:
I’d recommend always enabling profiling on your Kubernetes controllers by default because you will never know when you need it to debug
a performance issue until its too late. Keeping it disabled by default will prevent you from easily debugging performance issues when they pop up because
pprof server will require restarting the pod.
pprof endpoints expose sensitive information so they should always be bound to
or kept private by other techniques i.e. using kube-rbac-proxy.
Collecting and Analyzing Profiles
Now that you have profiling enabled on your controllers, you can simply port-forward to the controller pod and collect profiles.
Collect a CPU profile:
pprof web interface to analyze the profile:
Tip: I find flame graphs to be the one of the most valuable visualizations when analyzing most profiles, which can be done by navigating to http://localhost:8080/ui/flamegraph.