My CUDA test program to add 0.5 to an array of doubles does not add 0.5 to an array of doubles

Question

I have written a CUDA test program, because my more complex program was not working. This one isn't working either.

What should it do?

I've written a test program (I think) to add 0.5 to an array of numbers. Or at least that's what it's supposed to do.

Here's the code:

#include <iostream>

#include <cuda.h>

__global__
void cuda_kernel_func(double *in, double *out, int count)
{
    int index = blockIdx.x;
    if(index < count)
    {
        out[index] = in[index] + 0.5;
    }
}

int main()
{

    int num = 10;
    double *out;
    double *d_out;
    double *in;
    double *d_in;
    
    out = (double*)malloc(num * sizeof(double));
    in = (double*)malloc(num * sizeof(double));
    cudaMalloc(&d_out, num * sizeof(double));
    cudaMalloc(&d_in, num * sizeof(double));
    
    for(int i = 0; i < num; ++ i)
    {
        in[i] = (double)i;
    }
    
    cudaMemcpy(d_in, in, num * sizeof(double), cudaMemcpyHostToDevice);

    cuda_kernel_func<<<num, 1>>>(d_in, d_out, num);
    cudaDeviceSynchronize();
    
    cudaMemcpy(out, d_out, num * sizeof(double), cudaMemcpyDeviceToHost);
    
    cudaFree(d_in);
    cudaFree(d_out);
    
    for(int i = 0; i < num; ++ i)
    {
        std::cout << out[i] << " ";
    }
    std::cout << std::endl;
    
    free(in);
    free(out);

    return 0;
}

I am fairly new to CUDA, but not to parallelization or C/C++. I think the code is fairly self-explanatory.

Output:

0 0 0 0 0 0 0 0 0 0

Which isn't very exciting.

`I've noticed a lot of my "noob CUDA" questions get a lot of hostile responses, so if these could be avoided that would be great.` Yeah, well, it would be great if you listened to the practically infinite number of comments suggesting you to [add proper error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) (I just checked your previous questions and Robert Crovella made a point in linking that question numerous times, and yet you are still ignoring that advice). — user703016, Oct 14 '15 at 11:58
@GregorMcGregor I ran this with `cuda-memcheck` Didn't get any errors... — FreelanceConsultant, Oct 14 '15 at 11:59
in addition to @GregorMcGregor: add the desired and the actual output of your program — m.s., Oct 14 '15 at 12:00
when running your program I get `0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5` — m.s., Oct 14 '15 at 12:05
@m.s. That's what I would have liked to see on my computer... I'm compiling with: `nvcc cudaTest.cu --std=c++11 && ./a.out` — FreelanceConsultant, Oct 14 '15 at 12:06
I use the same compiler command with CUDA 7; adding CUDA error checking will probably point you to some erroneous CUDA call; make sure you can run the CUDA samples! — m.s., Oct 14 '15 at 12:08
@m.s. Aha - After doing some searching around I think my graphics card doesn't support the `-arch` I was compiling for. I changed the compile command to include the `-arch=compute_20` flag. But I have a GTX 260 (GT 200), which supports compute capability `1.3 ?` only? Is there anything I can do to compile for this GPU? — FreelanceConsultant, Oct 14 '15 at 12:20
you have to use an older CUDA version (such as [CUDA 6.5](http://developer.download.nvidia.com/compute/cuda/6_5/rel/docs/CUDA_Toolkit_Release_Notes.pdf)) which still supports `sm_13` — m.s., Oct 14 '15 at 12:37
@user3728501 Note that *runtime error checking* and running memcheck and/or debugger ARE NOT the same things. Error checking is described in `m.s.` answer below and in short can be described as "always check the return value". Lack of error checking attracts mentioned negative and even angry feedback to your questions. In this particular case proper error checking would save you A LOT of time (very first API call would say something like "Device is not supported"). — Ivan Aksamentov - Drop, Oct 14 '15 at 13:38
@user3728501 Then you probably need to start reading answers you get to [your questions](http://stackoverflow.com/questions/33091833/nvidia-cuda-code-compiles-but-does-not-return-values-correctly-misuse-of-point) — Ivan Aksamentov - Drop, Oct 15 '15 at 13:56
@drop thanks for the advice, when i said i didnt know that was actually the past tense — FreelanceConsultant, Oct 16 '15 at 17:12

score 3 · Answer 1 · edited May 23 '17 at 11:58

You should always use proper CUDA error checking:

cuda_kernel_func<<<num, 1>>>(d_in, d_out, num);
gpuErrchk( cudaPeekAtLastError() );

In your case (compiling for a wrong architecture), the error would be:

GPUassert: invalid device function main.cu 48

Since you state that you have a GTX 260 which supports Compute Capability 1.3, you need to use a CUDA version which supports this architecture.

CUDA 6.5 is the latest version you can use for your GPU's architecture (see the release notes).

The first CUDA version without support for sm_1x is CUDA 7 (see the release notes).

My CUDA test program to add 0.5 to an array of doubles does not add 0.5 to an array of doubles

What should it do?

1 Answers1