I've just started trying to learn CUDA again and came across some code I don't fully understand.
// declare GPU memory pointers
float * d_in;
float * d_out;
// allocate GPU memory
cudaMalloc((void**) &d_in, ARRAY_BYTES);
cudaMalloc((void**) &d_out, ARRAY_BYTES);
When the GPU memory pointers are declared, they allocate memory on the host. The cudaMalloc calls throw away the information that d_in and d_out are pointers to floats.
I can't think why cudaMalloc would need to know about where in host memory d_in & d_out have originally been stored. It's not even clear why I need to use the host bytes to store whatever host address d_in & d_out point to.
So, what is the purpose of the original variable declarations on the host?
======================================================================
I would've thought something like this would make more sense:
// declare GPU memory pointers
cudaFloat * d_in;
cudaFloat * d_out;
// allocate GPU memory
cudaMalloc((void**) &d_in, ARRAY_BYTES);
cudaMalloc((void**) &d_out, ARRAY_BYTES);
This way, everything GPU related takes place on the GPU. If d_in or d_out are accidentally used in host code, an error can be thrown at compile time, since those variables wouldn't be defined on the host.
I guess what I also find confusing is that by storing device memory addresses on the host, it feels like the device isn't in fully in charge of managing its own memory. It feels like there's a risk of host code accidentally overwriting the value of either d_in or d_out either through accidentally assigning to them in host code or another more subtle error, which could cause the GPU to lose access to its own memory. Also, it seems strange that the addresses assigned to d_in & d_out are chosen by the host, instead of the device. Why should the host know anything about which addresses are/are not available on the device?
What am I failing to understand here?