Since I'm having suspicions the "black box" (GPU) is not shutting down cleanly in some larger code (others perhaps too), I would include a cudaDeviceReset() at the end of main(). But wait! This would Segmentation fault all instances of classes statically created in main() with non-trivial CUDA code in destructors, right? E.g.
class A {
public:
cudaEvent_t tt;
cudaEvent_t uu;
A() {
cudaEventCreate(&tt);
cudaEventCreate(&uu);
}
~A(){
cudaEventDestroy(tt);
cudaEventDestroy(uu);
}
};
instantiated statically:
int main() {
A t;
cudaDeviceReset();
return 0;
}
segfaults on exit. Question: is perhaps cudaDeviceReset() invoked automatically on exit from main()?
Otherwise whole useful code of main() should be shifted to some run(), and cudaDeviceReset() should be the as last command in main(), right?