I need low-latency c++ synchronized queue, requirements are:
- no memory allocation at runtime, i just want to memcpy new items
 - lock-free
 - one writer
 - several readers (parallel)
 - parallel reading and writing
 
Should I implement it myself or I can use something from boost or probaly you can share another implementation? Because I don't want to allocate memory most likely queue should use ring-buffer inside.
Below is ring-buffer I wrote on C#. It meets all requirements except 4. It's easy to rewrite it to c++, but I don't know how can I support several readers in this implementation keeping it lock-free:
public sealed class ArrayPool<T> where T : class, new()
{
    readonly T[] array;
    private readonly uint length;
    private readonly uint MASK;
    private volatile uint curWriteNum;
    private volatile uint curReadNum;
    public ArrayPool(uint length = 65536) // length must be power of 2
    {
        if (length <= 0) throw new ArgumentOutOfRangeException("length");
        array = new T[length];
        for (int i = 0; i < length; i++)
        {
            array[i] = new T();
        }
        this.length = length;
        MASK = length - 1;
    }
    public bool IsEmpty
    {
        get { return curReadNum == curWriteNum; }
    }
    /// <summary>
    /// TryGet() itself is not thread safe and should be called from one thread.
    /// However TryGet() and Obtain/Commit can be called from different threads
    /// </summary>
    /// <returns></returns>
    public T TryGet()
    {
        if (curReadNum == curWriteNum)
        {
            return null;
        }
        T result = array[curReadNum & MASK];
        curReadNum++;
        return result;
    }
    public T Obtain()
    {
        return array[curWriteNum & MASK];
    }
    public void Commit()
    {
        curWriteNum++;
        if (curWriteNum - curReadNum > length)
        {
            Log.Push(LogItemType.Error,
                "ArrayPool curWriteNum - curReadNum > length: "
                + curWriteNum + ' ' + curReadNum + ' ' + length);
        }
    }
}
Usage is simple, you first call Obtain then reconfigure the item and then call Commit.
I add my C# implementation just to show what I need. I think it's impossible to add support for "several readers" and keep implementation lock-free at the same time, so I need something new.