I want to reduce a memory copy step during my data processing pipeline.
I want to do the following:
Generate some data from a custom C library
Feed the generated data into a MXNet model running on GPU.
For now, my pipeline does the following:
Create a C-contiguous numpy array via
np.empty(...).Get the pointer to numpy array via
np.ndarray.__array_interface__Call the C library from python (via ctypes) to fill the numpy array.
Convert the numpy array into mxnet
NDArray, this will copy the underlying memory buffer.Pack
NDArrays into amx.io.DataBatchinstance, then feed into model.
Please note, before being fed into model, all arrays stay in CPU memory.
I noticed a mx.io.DataBatch can only take a list of mx.ndarray.NDArrays as data and label parameter, but not numpy arrays. It works until you feed it into a model. On the other hand, I have a C library that can write directly to a C-contiguous array.
I would like to avoid the memory copying in step 3. One possible way is somehow getting a raw pointer to buffer of NDArray, while totally ignoring numpy. But whatever works.