Simple CUDA kernel not returning values as expected

Question

So, I'm starting to get so frustrated with CUDA that I decided to write the simplest piece of code I could, just to get my bearings. But something seems to be going right over my head. In my code, I'm just adding two arrays, and then storing them in a third array, like this:

#include <stdio.h>
#include <stdlib.h>

__global__ void add(int* these, int* those, int* answers)
{
    int tid = blockIdx.x;
    answers[tid] = these[tid] + those[tid];
}

int main()
{
    int these[50];
    int those[50];
    int answers[50];

    int *devthese;
    int *devthose;
    int *devanswers;

    cudaMalloc((void**)&devthese, 50 * sizeof(int));
    cudaMalloc((void**)&devthose, 50 * sizeof(int));
    cudaMalloc((void**)&devanswers, 50 * sizeof(int));


    int i;
    for(i = 0; i < 50; i++)
    {
        these[i] = i;
        those[i] = 2 * i;
    }

    cudaMemcpy(devthese, these, 50 * sizeof(int), cudaMemcpyHostToDevice);
    cudaMemcpy(devthose, those, 50 * sizeof(int), cudaMemcpyHostToDevice);
    add<<<50,1>>>(devthese, devthose, devanswers);

    cudaMemcpy(answers, devanswers, 50 * sizeof(int), cudaMemcpyDeviceToHost);
    for(i = 0; i < 50; i++)
    {
        fprintf(stderr,"%i\n",answers[i]);
    }
    return 0;
}

However, the int values that are being printed out aren't following the sequence of multiples of 3, which is what I was expecting. Can anyone explain what is going wrong?

http://stackoverflow.com/q/14038589/681865 shows how to check for runtime errors. Every API call in your code returns a status. You should be checking them all. — talonmies, Apr 20 '14 at 05:39
Your code works fine for me. If you're not getting 0,3,6,9... it's because there's something wrong with the machine you're using. I would add the error checking that has been suggested. The errors you get will be a good first indication of what is wrong with your machine. It might be something as simple as you running an incorrect compile command for the type of GPU you are using. — Robert Crovella, Apr 20 '14 at 06:05
Also, please edit your question to give it a meaningful title. [SO] questions aren't only for your benefit, they are intended to be useful to others who come after you. The title you have makes searching impossible — talonmies, Apr 20 '14 at 06:07
I cant post the answer to the problem, but I fixed my issue by deleting "-arch=sm_35" from my makefile — Chris Phillips, Apr 21 '14 at 18:06
@ChrisPhillips: That is now fixed. Please post your solution as an answer (and come back later and accept it). That gets this off the unanswered question list. — talonmies, Apr 21 '14 at 19:12

score 1 · Answer 1 · answered Apr 23 '14 at 05:27

From comments, the problem was apparently related to using the incorrect target architecture during compilation, leading to an executable which could not run on the OP's GPU.

This community wiki answer has been added to get this off the unanswered queue. It can be deleted if/when the OP comes back and provides a fuller answer.

Simple CUDA kernel not returning values as expected

1 Answers1

Linked