CUDA kernel function not called

Question

I'm getting started with CUDA, and I'm having some issues. The code I've posted below is basically the simplest example off the NVIDIA website, with some memory copies and a print statement added to make sure that it's running correctly.

The code compiles and runs without complaint, but when I print the vector c it comes out all zeros, as if the GPU kernel function isn't being called at all.

This is almost exactly the same as this post Basic CUDA - getting kernels to run on the device using C++.

The symptoms are the same, although I don't seem to be making this error. Any ideas?

#include <stdio.h>

static const unsigned short N = 3;

// Kernel definition
__global__ void VecAdd(float* A, float* B, float* C)
{
    int i = threadIdx.x;
    C[i] = A[i] + B[i];
} 

int main()
{
  float *A, *B, *C;
  float a[N] = {1,2,3}, b[N] = {4,5,6}, c[N] = {0,0,0};

  cudaMalloc( (void **)&A, sizeof(float)*N );
  cudaMalloc( (void **)&B, sizeof(float)*N );
  cudaMalloc( (void **)&C, sizeof(float)*N );

  cudaMemcpy( A, a, sizeof(float)*N, cudaMemcpyHostToDevice );
  cudaMemcpy( B, b, sizeof(float)*N, cudaMemcpyHostToDevice );

  VecAdd<<<1, N>>>(A, B, C);

  cudaMemcpy( c, C, sizeof(float)*N, cudaMemcpyHostToDevice );

  printf("%f %f %f\n", c[0],c[1],c[2]);

  cudaFree(A);
  cudaFree(B);
  cudaFree(C);

  return 0;
}

Always always always check the return value of functions. After the kernel call, call `cudaGetLastError`, too. — Kerrek SB, Feb 24 '14 at 08:52

sgarizvi · Answer 1 · 2017-06-30T14:16:56.943

5

In the last cudaMemcpy call, you are passing incorrect flag for memory copy direction.

cudaMemcpy( c, C, sizeof(float)*N, cudaMemcpyHostToDevice );

It should be:

cudaMemcpy( c, C, sizeof(float)*N, cudaMemcpyDeviceToHost );

edited Jun 30 '17 at 14:16

answered Feb 24 '14 at 09:07

sgarizvi

16,623
9
64
98

Indeed! But, when I make that change, the effect is the same - the vector c prints out as zeros. – user3195869 Feb 24 '14 at 14:06
Add [error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) to your code. There may be several reasons for unexpected output. Which GPU do you have? What compute capability are you compiling the code for? – sgarizvi Feb 24 '14 at 14:44
My machine has a G210M and a 9400M G. I'm not as certain about the 9400M G, but the G210M is listed as have a compute capability of 1.1, so that's what I've compiled for. This is the command line I've been using: `nvcc cuda-test.cu -o cuda-test --gpu-code compute_11 --gpu-architecture=compute_11` – user3195869 Feb 24 '14 at 15:10
The error checking revealed a lot: `GPUassert: CUDA driver version is insufficient for CUDA runtime version cuda-test.cu 30`. It seems pretty clear at this point that there's an issue with the drivers. – user3195869 Feb 24 '14 at 20:35
After reinstalling my drivers, the cuda code runs as expected. Thanks! – user3195869 Feb 24 '14 at 21:49

CUDA kernel function not called

1 Answers1