I have primitive messaging system inside application. Message can be submitted by the producer from one thread and processed by the consumer in another thread - there'are only two threads by the design: one thread for consumer and another for producer, and it's not possible to change this logic.
I'm using ConcurrentLinkedQueue<> implementation to work with messages:
// producer's code (adds the request)
this.queue.add(req);
// consumer's code (busy loop with request polling)
while (true) {
  Request req = this.queue.poll();
  if (req == null) {
    continue;
  }
  if (req.last()) {
    // last request submitted by consumer
    return;
  }
  // function to process the request
  this.process(req);
}
Processing logic is very fast, consumer may receive about X_000_000 requests per second.
But I've discovered using profiler that queue.poll() sometimes is very slow (it seems when queue is receiving a lot of new items from producer) - it's about 10x times slower when receiving a lot of new messages comparing to already filled up queue without adding new items from another thread.
Is it possible to optimize it? What is the best Queue<> implementation for this particular case (one thread for poll() and one thread for add())? Maybe it would be easier to implement some simple queue by-self?
 
    