cuda programm only returns 0

Question

i tried to run a simple cuda programm that adds to vector but it only result in a vector with zeros.

[EDIT] CUDA 11.0 compiles for a gpu with compute capability 5.2 by default. You can change this with the -arch= flag when compiling with nvcc (nvcc -arch=sm_50 file.cu in my case): cuda 11 kernel doesn't run

It seems like the kernel doesnt do anything, cause I tried it with save an integer directly in c[0]. The program runs on cuda 11.0 with a m1200 on ubuntu 20.04.

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#include <iostream>

__global__ void vectorAdd(int* a, int* b, int* c){
    int i = threadIdx.x;
    c[0] = 100;
    c[i] = a[i] + b[i];
    return;
}

int main() {

    int a[]= {1,2,3,4,5,6,7,8,9};
    int b[]= {1,2,3,4,5,6,7,8,9};   
    int sa = sizeof(a) / sizeof(int);
    int c[sa] = {0};
    
    int* cudaA = 0;
    int* cudaB = 0;
    int* cudaC = 0;
    
    cudaMalloc(&cudaA, sizeof(a));
    cudaMalloc(&cudaB, sizeof(b));
    cudaMalloc(&cudaC, sizeof(c));
    
    cudaMemcpy(cudaA, a, sizeof(a), cudaMemcpyHostToDevice);
    cudaMemcpy(cudaB, b, sizeof(b), cudaMemcpyHostToDevice);
    
    std::cout << sa << std::endl;
    vectorAdd <<< 1, sa >>> (cudaA, cudaB, cudaC);
    cudaMemcpy(c, cudaC, sizeof(c), cudaMemcpyDeviceToHost);
        
    for (int x = 0; x < 9; x++){
        std::cout << c[x]<< std::endl;
    }
    
    return 0;
}

the code is from a video on youtube

Can't reproduce this code printing zeros on [Compiler Explorer](https://cuda.godbolt.org/z/58Wvcf7b5). Maybe your driver is broken? I recommend using proper error checking so the runtime API can tell you if it knows something. See [What is the canonical way to check for errors using the CUDA runtime API?](https://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api). — paleonix, Jun 29 '23 at 18:33
I would recommend using the official samples, i.e. [`0_Introduction/vectorAdd`](https://github.com/NVIDIA/cuda-samples/blob/master/Samples/0_Introduction/vectorAdd/vectorAdd.cu) instead of something from YouTube. While the error handling in this official one is not using a macro for error checking and therefore quite wordy, at least it checks them and gets maintained on GitHub. — paleonix, Jun 29 '23 at 18:37
improper compile switch specification: https://stackoverflow.com/questions/63675040/cuda-11-kernel-doesnt-run/63675545#63675545 include `-arch=sm_50` in compile command. https://www.google.com/search?q=quadro+m1200+compute+capability&rlz=1C1GCEA_enUS983US983&oq=quadro+m1200+compute+capability&aqs=chrome..69i57j0i22i30j0i390i650l5.7271j0j7&sourceid=chrome&ie=UTF-8 — Robert Crovella, Jun 29 '23 at 18:38
@RobertCrovella thank you very very much this was the solution. I didnt find this by myself — Sux, Jun 29 '23 at 22:17

cuda programm only returns 0

0 Answers0