Brendan Gregg in his recent blog post CPU Utilization is Wrong suggests to use instructions per cycle PMC. In short if IPC is < 1.0 than the app can be considered memory bound. Otherwise it can be considered instruction bound. Here is a relevant excerpt from his post:
If your IPC is < 1.0, you are likely memory stalled, and software
  tuning strategies include reducing memory I/O, and improving CPU
  caching and memory locality, especially on NUMA systems. Hardware
  tuning includes using processors with larger CPU caches, and faster
  memory, busses, and interconnects.
If your IPC is > 1.0, you are likely instruction bound. Look for ways
  to reduce code execution: eliminate unnecessary work, cache
  operations, etc. CPU flame graphs are a great tool for this
  investigation. For hardware tuning, try a faster clock rate, and more
  cores/hyperthreads.
For my above rules, I split on an IPC of 1.0. Where did I get that
  from? I made it up, based on my prior work with PMCs. Here's how you
  can get a value that's custom for your system and runtime: write two
  dummy workloads, one that is CPU bound, and one memory bound. Measure
  their IPC, then calculate their mid point.
Here are some examples of dummy workloads generated by stress tool and thier IPCs.
Memory bound test, IPC is low (0,02):
$ perf stat stress --vm 4 -t 3
stress: info: [4520] dispatching hogs: 0 cpu, 0 io, 4 vm, 0 hdd
stress: info: [4520] successful run completed in 3s
 Performance counter stats for 'stress --vm 4 -t 3':
      10767,074968      task-clock:u (msec)       #    3,560 CPUs utilized          
                 0      context-switches:u        #    0,000 K/sec                  
                 0      cpu-migrations:u          #    0,000 K/sec                  
         4 555 919      page-faults:u             #    0,423 M/sec                  
     4 290 929 426      cycles:u                  #    0,399 GHz                    
        67 779 143      instructions:u            #    0,02  insn per cycle         
        18 074 114      branches:u                #    1,679 M/sec                  
             5 398      branch-misses:u           #    0,03% of all branches        
       3,024851934 seconds time elapsed
CPU bound test, IPC is high (1,44):
$ perf stat stress --cpu 4 -t 3
stress: info: [4465] dispatching hogs: 4 cpu, 0 io, 0 vm, 0 hdd
stress: info: [4465] successful run completed in 3s
 Performance counter stats for 'stress --cpu 4 -t 3':
      11419,683671      task-clock:u (msec)       #    3,805 CPUs utilized          
                 0      context-switches:u        #    0,000 K/sec                  
                 0      cpu-migrations:u          #    0,000 K/sec                  
               108      page-faults:u             #    0,009 K/sec                  
    30 562 187 954      cycles:u                  #    2,676 GHz                    
    43 995 290 836      instructions:u            #    1,44  insn per cycle         
    13 043 425 872      branches:u                # 1142,188 M/sec                  
        26 312 747      branch-misses:u           #    0,20% of all branches        
       3,001218526 seconds time elapsed