Brendan D. Gregg (author of DTrace book) has interesting variant of profiling: the "Off-CPU" profiling (and Off-CPU Flame Graph; slides 2013, p112-137) to see, where the thread or application were blocked (was not executed by CPU, but waiting for I/O, pagefault handler, or descheduled due short of CPU resources):
This time reveals which code-paths are blocked and waiting while off-CPU, and for how long exactly. This differs from traditional profiling which often samples the activity of threads at a given interval, and (usually) only examine threads if they are executing work on-CPU.
He also can combine Off-CPU profile data and On-CPU profile together: http://www.brendangregg.com/FlameGraphs/hotcoldflamegraphs.html
The examples given by Gregg are made using dtrace, which is not usually available in Linux OS. But there are some similar tools (ktap, systemtap, perf) and the perf as I think has widest installed base. Usually perf generated On-CPU profiles (which functions were executed more often on CPU).
- How can I translate Gregg's Off-CPU examples to perfprofiling tool in Linux?
PS: There is link to systemtap variant of Off-CPU flamegraphs in the slides from LISA13, p124: "Yichun Zhang created these, and has been using them on Linux with SystemTap to collect the profile data. See: • http://agentzh.org/misc/slides/off-cpu-flame-graphs.pdf"" (CloudFlare Beer Meeting on 23 August 2013)
 
     
    