I am new to CUDA and got a little confused with cudaEvent. I now have a code sample that goes as follows:
float elapsedTime;
cudaEvent_t start, stop;
CUDA_ERR_CHECK(cudaEventCreate(&start));
CUDA_ERR_CHECK(cudaEventCreate(&stop));
CUDA_ERR_CHECK(cudaEventRecord(start));
// Kernel functions go here ...
CUDA_ERR_CHECK(cudaEventRecord(stop));
CUDA_ERR_CHECK(cudaEventSynchronize(stop));
CUDA_ERR_CHECK(cudaEventElapsedTime(&elapsedTime, start, stop));
CUDA_ERR_CHECK(cudaDeviceSynchronize());
I have two questions regarding this code:
1.Is the last cudaDeviceSynchronize necessary? Because according to the documentation for cudaEventSynchronize, its functionality is Wait until the completion of all device work preceding the most recent call to cudaEventRecord(). So given that we have already called cudaEventSynchronize(stop), do we need to call cudaDeviceSynchronize once again?
2.How different is the above code compared to the following implementation:
#include <chrono>
auto tic = std::chrono::system_clock::now();
// Kernel functions go here ...
CUDA_ERR_CHECK(cudaDeviceSynchronize());
auto toc = std::chrono::system_clock:now();
float elapsedTime = std::chrono::duration_cast < std::chrono::milliseconds > (toc - tic).count() * 1.0;