Calculate the sum of 2 vectors using CUDA

Question

I have a simple task, that I can't seem to solve. I got 2 unidimensional arrays (called vectors) consisting of 10 elements. Each element of the array contains a random positive number. The goal is to use CUDA to calculate the sum of those 2 arrays of each index number (in other words: Vector Sum[0] = Vector A[0] + Vector B[0], then the same with 1,2...10)

Here is my code (kernel.cu). I know I am using the "float-anything" variable names for integer data types. That's because I initially planned to do it on float data types but I could not get the project working at all as a result of data type incompatibilities. Correct me if it's actually possible using float data types for this.

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#include <iostream>

__global__ void vecAdd_kernel(int *floatAr1gpu, int *floatAr2gpu, int *floatSumGPU, int The_N){
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    if (i < The_N) floatSumGPU[i] = floatAr1gpu[i] + floatAr2gpu[i];
}

int main() 
{
    const unsigned int arraySize = 10;
    int floatArray1[arraySize];
    int floatArray2[arraySize];
    int *floatAr1gpu = 0;
    int *floatAr2gpu = 0;
    int floatSum[arraySize];
    int *floatSumGPU = 0;

    for (int c = 0; c < arraySize; c++) {
        floatArray1[c] = (rand() % 10)+1;
        floatArray2[c] = (rand() % 10)+1;
    }
    //Put the data into the GPU now
    //                      V--- This is allocating GPU memory under that name and Variable
    cudaMalloc((void **)&floatArray1, sizeof(float)*arraySize);
    cudaMalloc((void **)&floatArray2, sizeof(float)*arraySize);
    cudaMalloc((void **)&floatSum, sizeof(float)*arraySize);

    //           CPU Memory    GPU Mem       Array size              Method
    cudaMemcpy(floatArray1, floatAr1gpu, sizeof(float)*arraySize, cudaMemcpyHostToDevice);
    cudaMemcpy(floatArray2, floatAr2gpu, sizeof(float)*arraySize, cudaMemcpyHostToDevice);

    // execute
    //         grid size, block size
    vecAdd_kernel << < 1, arraySize >> > (floatArray1, floatArray2, floatSum, arraySize);

    //Copy data back from GPU to RAM
    //          GPU Memory   CPU Mem       Array size               Method
    cudaMemcpy(floatSumGPU, floatSum, sizeof(float)*arraySize, cudaMemcpyDeviceToHost);

    // clean up
    cudaFree(floatArray1);
    cudaFree(floatArray2);
    cudaFree(floatSum);

    for (int cc = 0; cc < arraySize; cc++) {
        std::cout << "Result of array number " << cc << " = " << floatSum[cc] << std::endl;
    }
    std::cout << "Done. Press any key to exit." << std::endl;
    char key = std::cin.get();

    return 0;
}

This is what I get as a result: Program result

This is what I want to achieve (using CUDA): Program result

What's wrong with the code? I placed a break-point to check that array here: array contents

Programming-wise, you went off into the weeds here: `cudaMalloc((void **)&floatArray1, sizeof(float)*arraySize);` You should be allocating device memory using the `floatAr1gpu` pointer, not the name of your host array. (and the same for the next two `cudaMalloc` statements) The next problem is in your `cudaMemcpy(floatArray1, floatAr1gpu, ...` statement. This statement follows the `memcpy()` C-api syntax. The first pointer is always the destination pointer. THe second pointer is always the source pointer. So you have it backwards. — Robert Crovella, Nov 24 '18 at 17:13
Sure we could go through your entire code and sort out problems like that, but CUDA includes a `vectorAdd` sample code that does precisely what you are setting out to do here. Why not study that? And as pointed out in the answer already, it's good practice to use proper CUDA error checking, and run your code with `cuda-memcheck`, **before** asking others for help. Take advantage of the tool help that is available to you. Even if you don't understand the error output, it will be useful to others trying to help you. — Robert Crovella, Nov 24 '18 at 17:14

einpoklum · Answer 1 · 2018-11-24T18:04:07.970

Without scrutinizing your code too much: It's quite likely that you're getting a CUDA error somewhere, and instead of quitting and reporting it you're trying to go like everything succeeded. Don't do that. cudaGetLastError() is your friend; or, better yet, read this:

What is the canonical way to check for errors using the CUDA runtime API?

also, you should use the cuda-memcheck tool (it comes with CUDA) to see whether it discovers any invalid memory accesses and other memory-related issues with your program (Thanks @RobertCrovella for reminding me). Running it, we get a bunch of errors from your program, such as:

========= Program hit cudaErrorInvalidValue (error 11) due to "invalid argument" on CUDA API call to cudaMemcpy. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x357283]
=========     Host Frame:a [0x3d70f]
=========     Host Frame:a [0x644c]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf1) [0x202e1]
=========     Host Frame:a [0x609a]

So - again, without scrutinizing the code - you must have provided the wrong arguments to cudaMemcpy(). Check your program against the Runtime API documentation.

Calculate the sum of 2 vectors using CUDA

1 Answers1