I am trying to set a flag in one kernel function and read it in another. Basically, I'm trying to do the following.
#include <iostream>                                                              
#include <cuda.h>                                                                
#include <cuda_runtime.h>                                                        
#define FLAGCLEAR 0                                                              
#define FLAGSET   1                                                              
using namespace std;                                                             
__global__ void set_flag(int *flag)                                              
{                                                                                
    *flag = FLAGSET;                                                             
    // Wait for flag to reset.                                                   
    while (*flag == FLAGSET);                                                    
}                                                                                
__global__ void read_flag(int *flag)                                             
{                                                                                
    // wait for the flag to set.                                                 
    while (*flag != FLAGSET);                                                    
    // Clear it for next time.                                                   
    *flag = FLAGCLEAR;                                                           
}                                                                                
int main(void)                                                                   
{                                                                                
    // Setup memory for flag                                                     
    int *flag;                                                                   
    cudaMalloc(&flag, sizeof(int));                                              
    // Setup streams                                                             
    cudaStream_t stream0, stream1;                                               
    cudaStreamCreate(&stream0);                                                  
    cudaStreamCreate(&stream1);                                                  
    // Print something to let me know that we started.                           
    cout << "Starting the flagging" << endl;                                     
    // do the flag test                                                          
    set_flag  <<<1,1,0,stream0>>>(flag);                                         
    read_flag <<<1,1,0,stream1>>>(flag);                                         
    // Wait for the streams                                                      
    cudaDeviceSynchronize();                                                     
    // Getting here is a painful process!
    cout << "Finished the flagging" << endl;                                     
    // Clean UP!                                                                 
    cudaStreamDestroy(stream0);                                                  
    cudaStreamDestroy(stream1);                                                  
    cudaFree(flag);                                                              
}
I eventually get the second printout, but only after the computer freezes for 15 seconds, and I get both printouts at the same time. These streams are supposed to run in parallel, and not bog the system down. What am I doing wrong? How can I fix this?
Thanks.
EDIT
It seems as though a special case has been solved by adding volitile but now something else has broken.  If I add anything between the two kernel calls, the system reverts back to the old behavior, namely freezing and printing everything at once.  This behavior is shown by adding sleep(2); between set_flag and read_flag.  Also, when put in another program, this causes the GPU to lock up.  What am I doing wrong now?
Thanks again.
 
     
    