To measure the impact of cache-misses in a program, I want to latency caused by cache-misses to the cycles used for actual computation.
I use perf stat to measure the cycles, L1-loads, L1-misses, LLC-loads and LLC-misses in my program. Here is a example output:
               467 769,70 msec task-clock                #    1,000 CPUs utilized          
        1 234 063 672 432      cycles                    #    2,638 GHz                      (62,50%)
          572 761 379 098      instructions              #    0,46  insn per cycle           (75,00%)
          129 143 035 219      branches                  #  276,083 M/sec                    (75,00%)
            6 457 141 079      branch-misses             #    5,00% of all branches          (75,00%)
          195 360 583 052      L1-dcache-loads           #  417,643 M/sec                    (75,00%)
           33 224 066 301      L1-dcache-load-misses     #   17,01% of all L1-dcache hits    (75,00%)
           20 620 655 322      LLC-loads                 #   44,083 M/sec                    (50,00%)
            6 030 530 728      LLC-load-misses           #   29,25% of all LL-cache hits     (50,00%)
Then my question is: How to convert the number of cache-misses into a number of "lost" clock cycles? Or alternatively, what is the proportion of time spent for fetching data?
I think the factor should be known by the constructor. My processor is Intel Core i7-10810U, and I couldn't find this information in the specifications nor in this list of benchmarked CPUs.
This related problem describes how to measure the number of cycles lost in a cache-miss, but is there a way to obtain this as hardware information? Ideally, the output would be something like:
L1-hit: 3 cycles
L2-hit: 10 cycles
LLC-hit: 30 cycles
RAM: 300 cycles
 
    