Is it possible to prevent a memory address being accessed by other threads for some period? for example:
__global__ void func(int* a){
  // other computation
  __lock_address(a);
  a[0] += threadIdx.x;
  __unlock_address(a);
}
the first thread that finished the other computations and reached __lock_address will lock that memory address untill _unlock_address is called, any other threads that reached __lock_address will have to wait until the first thread unlocks it.
The above example is basically equivalent to atomicAdd, but what if I want to do more complicated computation rather than a simple addition?
Edit: mutex in initialized to 0, a is initialized to -1
__global__ void func(int *a, int *mutex){
  a[0] = atomicCAS(mutex, 0, 1); // a[0] = 1
}
if I do this, a[0] is equal to 1. but it should be 0 since that is the old value of mutex.
__global__ void func(int *a, int *mutex){
  a[0] = mutex[0]; // a[0] = 0
}
This is a sanity check, value at a[0] is 0 now. which means mutex is initialized to 0 correctly.