I am trying to measure the # of computations performed in a C++ program (FLOPS). I am using a Broadwell-based CPU and not using GPU. I have tried the following command, which I included all the FP-related events I found.
perf stat -e fp_arith_inst_retired.128b_packed_double,fp_arith_inst_retired.128b_packed_single,fp_arith_inst_retired.256b_packed_double,fp_arith_inst_retired.256b_packed_single,fp_arith_inst_retired.double,fp_arith_inst_retired.packed,fp_arith_inst_retired.scalar,fp_arith_inst_retired.scalar_double,fp_arith_inst_retired.scalar_single,fp_arith_inst_retired.single,inst_retired.x87 ./test_exe
I got something as follows:
 Performance counter stats for './test_exe':
                 0      fp_arith_inst_retired.128b_packed_double    (36.36%)
                 0      fp_arith_inst_retired.128b_packed_single     (36.36%)
                 0      fp_arith_inst_retired.256b_packed_double     (36.37%)
                 0      fp_arith_inst_retired.256b_packed_single     (36.37%)
     4,520,439,602      fp_arith_inst_retired.double     (36.37%)
                 0      fp_arith_inst_retired.packed     (36.36%)
     4,501,385,966      fp_arith_inst_retired.scalar     (36.36%)
     4,493,140,957      fp_arith_inst_retired.scalar_double     (36.37%)
                 0      fp_arith_inst_retired.scalar_single     (36.36%)
                 0      fp_arith_inst_retired.single     (36.36%)
        82,309,806      inst_retired.x87              (36.36%)
      65.861043789 seconds time elapsed
      65.692904000 seconds user
       0.164997000 seconds sys
Questions:
- Although the C++ program is a large project, I did not use any SSE/AVX instructions. I am not familiar with SSE/AVX instruction set. The project is just written by the "ordinary" C++. Why does it contain many fp_arith_inst_retired.double,fp_arith_inst_retired.scalarandfp_arith_inst_retired.scalar_double? These counters are related to SSE/AVX computations, right?
- What do the percentages in brackets mean? such as (36.37%)
- How can I compute the FLOPS in my C++ program based on the perfresults?
Thanks.
 
    