How to profile benchmarks using the pprof tool?

Question

I want to profile my benchmarks generated by go test -c, but the go tool pprof needs a profile file usually generated inside the main function like this:

func main() {
    flag.Parse()
    if *cpuprofile != "" {
        f, err := os.Create(*cpuprofile)
        if err != nil {
            log.Fatal(err)
        }
        pprof.StartCPUProfile(f)
        defer pprof.StopCPUProfile()
    }

How can I create a profile file within my benchmarks ?

score 19 · Accepted Answer · edited Feb 18 '23 at 23:38

19

As described in https://pkg.go.dev/cmd/go#hdr-Testing_flags you can specify the profile file using the flag -cpuprofile.

For example

go test -cpuprofile cpu.out

edited Feb 18 '23 at 23:38

Tclairet

105
1
3

answered Apr 13 '14 at 21:14

simon

206
1
4

9

an additional piece that was not at first obvious to me is that a .test file is created for you to pass to pprof – neonstalwart May 04 '15 at 22:15

score 3 · Answer 2 · answered Apr 13 '14 at 21:13

3

Use the -cpuprofile flag to go test as documented at http://golang.org/cmd/go/#hdr-Description_of_testing_flags

answered Apr 13 '14 at 21:13

Evan

6,369
1
29
30

logix · Answer 3 · 2018-04-02T12:05:06.913

This post explains how to profile benchmarks with an example: Benchmark Profiling with pprof.

The following benchmark simulates some CPU work.

package main

import (
    "math/rand"
    "testing"
)

func BenchmarkRand(b *testing.B) {
    for n := 0; n < b.N; n++ {
        rand.Int63()
    }
}

To generate a CPU profile for the benchmark test, run:

go test -bench=BenchmarkRand -benchmem -cpuprofile profile.out

The -memprofile and -blockprofile flags can be used to generate memory allocation and blocking call profiles.

To analyze the profile use the Go tool:

go tool pprof profile.out
(pprof) top
Showing nodes accounting for 1.16s, 100% of 1.16s total
Showing top 10 nodes out of 22
      flat  flat%   sum%        cum   cum%
     0.41s 35.34% 35.34%      0.41s 35.34%  sync.(*Mutex).Unlock
     0.37s 31.90% 67.24%      0.37s 31.90%  sync.(*Mutex).Lock
     0.12s 10.34% 77.59%      1.03s 88.79%  math/rand.(*lockedSource).Int63
     0.08s  6.90% 84.48%      0.08s  6.90%  math/rand.(*rngSource).Uint64 (inline)
     0.06s  5.17% 89.66%      1.11s 95.69%  math/rand.Int63
     0.05s  4.31% 93.97%      0.13s 11.21%  math/rand.(*rngSource).Int63
     0.04s  3.45% 97.41%      1.15s 99.14%  benchtest.BenchmarkRand
     0.02s  1.72% 99.14%      1.05s 90.52%  math/rand.(*Rand).Int63
     0.01s  0.86%   100%      0.01s  0.86%  runtime.futex
         0     0%   100%      0.01s  0.86%  runtime.allocm

The bottleneck in this case is the mutex, caused by the default source in math/rand being synchronized.

Other profile presentations and output formats are also possible, e.g. tree. Type help for more options.

Note, that any initialization code before the benchmark loop will also be profiled.

How to profile benchmarks using the pprof tool?

3 Answers3