Reusing cudaEvent to serialize multiple streams

Question

Suppose I have a struct:

typedef enum {ON_CPU,ON_GPU,ON_BOTH} memLocation;

typedef struct foo *foo;
struct foo {
  cudaEvent_t event;
  float       *deviceArray;
  float       *hostArray;
  memLocation arrayLocation;
};

a function:

void copyArrayFromCPUToGPUAsync(foo bar, cudaStream_t stream)
{
  cudaStreamWaitEvent(stream, bar->event);
  if (bar->arrayLocation == ON_CPU) {
    // ON_CPU means !ON_GPU and !ON_BOTH
    cudaMemcpyAsync(cudaMemcpyHostToDevice, stream);
    bar->arrayLocation = ON_BOTH;
  }
  cudaEventRecord(bar->event, stream);
}

void doWorkOnGPUAsync(foo bar, cudaStream_t stream)
{
  cudaStreamWaitEvent(stream, bar->event);
  // do async work
  cudaEventRecord(bar->event, stream);
}

And the following scenario (with a lion, witch, and wardrobe fitting in somewhere as well):

// stream1, stream2, and stream3 have no prior work
// assume bar->arrayLocation = ON_GPU

doWorkOnGPUAsync(bar, stream1);
copyArrayFromCPUToGPUAsync(bar, stream2); // A no-op
doWorkOnGPUAsync(bar, stream3);

Is the above safe? I.e. will stream2 still wait on stream1 to finish its "work" if it itself does no work? And will the resulting recorded cudaEvent reflect this, such that stream3 will not start until stream1 finishes?

einpoklum · Accepted Answer · 2021-03-05T22:45:28.267

1

This should be safe.

There is no mention anywhere (that I know) of some kind "event cancellation" due to lack of other work between a wait-on-event and the recording of another event. And it doesn't matter that you're re-using the same event object in the cudaEventRecord() call, since as the Runtime API docs say:

cudaEventRecord() can be called multiple times on the same event and will overwrite the previously captured state. Other APIs such as cudaStreamWaitEvent() use the most recently captured state at the time of the API call, and are not affected by later calls to cudaEventRecord().

Additional notes:

With your apparent use-case, you may also want to consider the possibility of using managed memory instead of manually copying back and forth.
You should check for the success of your various operations, not just assume they succeeded.

edited Mar 05 '21 at 22:45

answered Mar 05 '21 at 21:41

einpoklum

118,144
57
340
684

Thanks for the answer! "You should check for the success of your various operations, not just assume they succeeded." Rest assured my snippets are __heavily__ pared down versions of the actual thing ;). – Jacob Faib Mar 05 '21 at 23:09
@JacobFaib: It doesn't have to involve longer code: Well, you wouldn't need to redact your code in this respect; see [my answer](https://stackoverflow.com/a/20478474/1593077) for the question I linked to. – einpoklum Mar 05 '21 at 23:27

Reusing cudaEvent to serialize multiple streams

1 Answers1