cudaMalloc cast (void**) and struct member allocation/copy and sizeof

Question

This question is an extension of this question and related to this question.

[Q1] Do I need to cast to (void**) when doing cudaMalloc of a struct member? Example (Please see in code questions):

The structure:

typedef struct {
  int a;
  int *b;
} Matrix;

The main function for allocating and copying to device:

int main void() 
{
  int rows, cols, numMat = 2;

  //[Q2] What would be the problem of no allocating (numMat * sizeof()) here?
  //for example, allocating just for sizeof(Matrix)?
  Matrix *data = (Matrix*)malloc(numMat * sizeof(Matrix));

  // ... Successfully read from file into "data" ...
  //[Q3] Do we really need to copy "data" to host?
  //[A3] No necessary
  Matrix *h_data = (Matrix*)malloc(numMat * sizeof(Matrix));
  memcpy(h_data, data, numMat * sizeof(Matrix));

  // ... Copy matrix data is now on the gpu ...
  //[Q4] Do we need to cast (void**)&(h_data->a)? 'a' not a pointer.
  //[A4] An int cannot be copied in this fashion
  // cudaMalloc(&(h_data->a), rows*cols*sizeof(int));
  // cudaMemcpy(h_data->a, data->a, rows*cols*sizeof(int), cudaMemcpyHostToDevice);

  //[Q5] Do we need to cast (void**)&(h_data->b)? 'b' is a pointer
  cudaMalloc(&(h_data->b), rows*cols*sizeof(int));
  cudaMemcpy(h_data->b, data->b, rows*cols*sizeof(int), cudaMemcpyHostToDevice);

  // ... Copy the "meta" data to gpu ...
  //[Q6] Can we just copy h_data instead? Why creating another pointer "d_data"?
  //[A6] Yes
  Matrix *d_data;

  //[Q7] Wouldn't we need to cast (void**)&d_data?
  cudaMalloc(&d_data, numMat*sizeof(Matrix));

  //[Q8] h_data is in host and device. Can we just copy "data" to device?
  cudaMemcpy(d_data, h_data, numMat*sizeof(Matrix));
  // ... Do other things ...
}

Ultimately, we would just want to pass Matrix as a pointer:

// Kernel call
doThings<<<dimGrid, dimBlock>>>(d_data);

The kernel definition:

__global__ doThings(Matrix *matrices)
{
  matrices->a = ...;
  matrices->b = ...;
}

Thanks in advance for the time and work in helping me on my doubts!

Copying structures with embedded pointers is somewhat involved. Your code appears to be missing at least one step conceptually. You might want to review my answer [here](http://stackoverflow.com/questions/15431365/cudamemcpy-segmentation-fault/15435592#15435592), which I'm tempted to mark as a duplicate. For your question 3, there is no difference in the type of storage between `data` and `h_data`. There is no need to allocate both and copy from one to the other, they are both on the "host". — Robert Crovella, Feb 27 '14 at 20:28
@RobertCrovella Thanks, your examples on the link are very useful. I think I know what step you're talking about, the for-loop for more than one Matrix perhaps? Actually, the original example has it but in my implementation I won't have more than one Matrix struct. — mrei, Feb 27 '14 at 23:45
There are several things wrong with your code. You cannot do a `cudaMalloc` on `h_data->a`. We do `cudaMalloc` on *pointers*, by taking the address of the pointer, and passing it to `cudaMalloc`. `h_data->a` is not a pointer. It is an `int`. The step I was referring to was that somewhere along the way you have to copy a *pointer* that has been allocated by `cudaMalloc` to the device copy of the pointer `b`, i.e. d_data->b. Step 5 in the answer I linked. I think there may be other defects in your code as well. — Robert Crovella, Feb 27 '14 at 23:56

cudaMalloc cast (void**) and struct member allocation/copy and sizeof

0 Answers0

Linked