I am trying to parallelize a C function using CUDA. I noticed that there are several structs which are being passed as pointers to this function.
With the unified memory view, I have identified and modified malloc() to cudaMallocManaged().
But, now there is a allocation using memalign(). I want to achieve a similar task as that was done by cudaMallocManaged().
Does such an equivalent exists ? If no, then what needs to be done?
This is how the memalign() allocation line looks:
float *data = (float*) memalign(16, some_integer*sizeof(float));