I have a dpdk 19 application and read from nic(MT27800 Family [ConnectX-5] 100G) with 32 rx multiqueue with RSS .
So there are 32 processes that receive traffic from nic with dpdk, Each process read from a different Queue, copy from the mbuf the data to allocated memory, accumulate to 6MB and send it to another thread via a lockless Queue, that other thread only write the data to disk. As a result I/O write is cached in linux memory.
All processes run with cpu affinity, there is isolcpus in the grub
This a little pseudo code of what happen in each of the 32 processes that read from its Queue, i can't put the real code, it is too much
MainFunction()
{
   char * local_buf = new...
   int nBufs = rte_eth_rx_burst(pi_nPort, pi_nQNumber, m_mbufs, 216);
   for(mbuf in m_mbufs)
   { 
       memcpy(local_buf+offset, GetData(mbuf),len);//accumulate to buf
       if(local_buf.len > MAX)
       {
          PushToQueue(local_buf);
          local_buf = new ...
       }
       rte_pktmbuf_free(mbuf);
   }
}
WriterThreadMainFunc
{
     While(QueueNotEmpty)      
     {
          buf = PullFromQ
          WriteToDisk(buf)
          delete buf;
     }
}
When the server memory is completely cache ( I know it is still available) I start seeing drops at nic.
If I delete the data from disk every minute the cached memory is released to free and and no drops at nic. So the drops are clearly linked to the cached data. Until the first drops the application can receive run without drops for 2 hours. The process don't use much memory each process is at 500 MB.
How can I avoid the drops at nic?
               total        used        free      shared  buff/cache   available
Mem:           125G         77G        325M         29M         47G         47G
Swap:          8.0G        256K        8.0G
I use Centos 9.7 linux 3.10.0-1160.49.1.el7.x86_64.