A"cudaErrorUnknown" in cudaMemcpy function call

Question

I use a function to calculate three vector's vector dot, and use a reduce to make it faster. However,I always got an error like this:

CUDA error at kernel.cu:120 code=30(cudaErrorUnknown) "cudaMemcpy(partia
l_c, dev_partial_c,sizeofblock,cudaMemcpyDeviceToHost )"

I cannot figure out why because the code seems normal.And the allocate function didn't return error.Are there any possible solutions? Thank you a lot.

double vector_dot(double* d_A,double* d_B,double *d_C,int numElements)
{
    int size = sizeof(double) * numElements;
    int c_size = sizeof(char) * numElements;
    double *d_D=NULL;
    checkCudaErrors(cudaMalloc((void**)&d_D,size)); 
    // Launch the Vector Add CUDA Kernel
    int threadsPerBlock = 256;
    int blocksPerGrid =(numElements + threadsPerBlock - 1) / threadsPerBlock;
    vectorMPL<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_D, numElements);
    double *partial_c;
    double*dev_partial_c;
    int sizeofblock=blocksPerGrid*sizeof(double);
    partial_c = (double*)malloc(sizeofblock);
    checkCudaErrors( cudaMalloc( (void**)&dev_partial_c,sizeofblock )) ;
    vector_dot_h<<<blocksPerGrid, threadsPerBlock>>>(d_D, d_C, dev_partial_c, numElements);

    double sum = 0;
    checkCudaErrors(cudaMemcpy(partial_c,dev_partial_c,sizeofblock,cudaMemcpyDeviceToHost));

    for (int i=0; i<blocksPerGrid; i++) {
        sum += partial_c[i];
    }

    checkCudaErrors(cudaFree(d_D));
    checkCudaErrors(cudaFree(dev_partial_c));  
    free(partial_c);
    // Reset the device and exit
    checkCudaErrors(cudaDeviceReset());
    return sum;
}

If I delete this ,I will receive the unknown error in cudafree call.It seems all the cuda API call cannot be finished.I am wondering why? What is a cuda error unknown?What is the cause?

What is the guarantee that the kernel was launched and executed correctly ? Try using cudaGetLastError() after the kernel is called and see if it returns proper value. And also if possible share the vector_dot_h kernel code. — Sagar Masuti, Oct 04 '13 at 03:18

score 3 · Accepted Answer · edited May 23 '17 at 11:50

3

The documentation for both cudaMemcpy and cudaFree contains the following note:

Note that this function may also return error codes from previous, asynchronous launches.

ie. the error isn't happening in either cudaMemcpy or cudaFree, rather it is happening during the previous kernel launch or execution. If you follow this advice and modify your code to something like this:

vectorMPL<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_D, numElements);
checkCudaErrors(cudaPeekAtLastError());
checkCudaErrors(cudaDeviceSynchronize());

You should find that the error is reported by the cudaDeviceSynchronize() call, indicating that the error occurred when kernel was executing. The underlying reason for the error will most likely be out of bounds memory access within vector_dot_h, but as you have not provided either code nor execution parameters for the kernel call in question, it isn't possible to give you a more accurate diagnosis. The CUDA toolkit ships with a utility called cuda-memcheck which you can use to get more information about illegal memory access patterns in a running kernel. I recommend you try using it on this code.

edited May 23 '17 at 11:50

Community

1
1

answered Oct 04 '13 at 05:31

talonmies

70,661
34
192
269

Thank you!I am trying cuda-memcheck。 – kururu Oct 04 '13 at 07:36
Invalid __global__ read of size 8 at 0x00000068 in vectorMPL(double*, double*, double*, int) by thread (8,0,0) in block (0,0,0) Address 0x0554aa40 is out of bounds – kururu Oct 04 '13 at 07:37
1

@kururu: As I said - you have a kernel problem. Either you are passing an invalid pointer, or you have an indexing error in your kernel code. My suggestion is to accept this answer (to mark the question as answered), and spend some time debugging the kernel. If you find you still can't solve this, ask a new question with the kernel code and the shortest possible complete example code which someone else could compile and run. – talonmies Oct 04 '13 at 08:11
Thank you.I will follow your advice. – kururu Oct 04 '13 at 08:27

A"cudaErrorUnknown" in cudaMemcpy function call

1 Answers1