I was debugging high System CPU usage (Not user CPU usage) on of our storm supervisors (Wheezy machine). Here are the observations
Output of perf for the relevant process:
Events: 10K cpu-clock
16.40%  java  [kernel.kallsyms]   [k] system_call_after_swapgs
13.95%  java  [kernel.kallsyms]   [k] pvclock_clocksource_read
12.76%  java  [kernel.kallsyms]   [k] do_gettimeofday
12.61%  java  [vdso]              [.] 0x7ffe0fea898f
 9.02%  java  perf-17609.map      [.] 0x7fcabb8b85dc
 7.16%  java  [kernel.kallsyms]   [k] copy_user_enhanced_fast_string
 4.97%  java  [kernel.kallsyms]   [k] native_read_tsc
 2.88%  java  [kernel.kallsyms]   [k] sys_gettimeofday
 2.82%  java  libjvm.so           [.] os::javaTimeMillis()
 2.39%  java  [kernel.kallsyms]   [k] arch_local_irq_restore
Caught this in strace of a thread of the relevant process
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.000247           0     64038           gettimeofday
  0.00    0.000000           0         1           rt_sigreturn
  0.00    0.000000           0         1           futex
------ ----------- ----------- --------- --------- ----------------
100.00    0.000247                 64040           total
Finally figured out that the thread was running in while(true) and one of the calls inside was System.currentTimeMillis(). I disabled the same and the system CPU % went down from 50% to 3%. So clearly that was the issue. What I fail to understand is, in the presence of vDSO these kernel calls should only happen in the user's address space. But as is clear from perf report, kernel calls are indeed taking place in the kernel space. Any pointers on this? 
Kernel version: 3.2.0-4-amd64 Debian 3.2.86-1 x86_64 GNU/Linux
clock type: kvm
Adding code of the problematic thread.
@RequiredArgsConstructor
public class TestThread implements Runnable {
    private final Queue<String> queue;
    private final Publisher publisher;
    private final int maxBatchSize;
    private long lastPushTime;
    @Override
    public void run() {
        lastPushTime = System.currentTimeMillis();
        List<String> events = new ArrayList<>();
        while (true) {
            try {
                String message = queue.poll();
                long lastPollTime = System.currentTimeMillis();
                if (message != null) {
                    events.add(message);
                    pushEvents(events, false);
                }
                // if event threshold hasn't reached the size, but it's been there for over 10seconds, push it.
                if ((lastPollTime - lastPushTime > 10000) && (events.size() > 0)) {
                    pushEvents(events, true);
                }
            } catch (Exception e) {
                // Log and do something
            }
        }
    }
    private void pushEvents(List<String> events, boolean forcePush) {
        if (events.size() >= maxBatchSize || forcePush) {
            pushToHTTPEndPoint(events);
            events.clear();
            lastPushTime = System.currentTimeMillis();
        }
    }
    private void pushToHTTPEndPoint(List<String> events) {
        publisher.publish(events);
    }
}
 
     
     
    