Say I have a Fortran program that performs two tasks on an array: task A computes its mean and task B doubles it. The point is that task B should be independent from task A. When accelerating the program with OpenACC, it would make sense to run the two tasks concurrently by making task A asynchronous:
program test
    implicit none
    integer, parameter :: n = 1000000
    real(8) :: mean
    real(8) :: array(n)
    real(8) :: array_d(n)
    ! initialize array
    array = [(i, i=1, n)]
    !$acc kernels async num_gangs(1)
    ! Task A: get mean of array
    mean = 0d0
    !$acc loop independent reduction(+:mean)
    do i = 1, n
        mean = mean + array(i)
    end do
    mean = mean / n
    !$acc end kernels
    !$acc kernels
    ! Task B: work on array
    !$acc loop independent
    do i = 1, n
        array(i) = array(i) * 2
    end do
    !$acc end kernels
    !$acc wait
    !$acc end data
    ! print array and mean
    print "(10(g0.2, x))", array(:10)
    print "('mean = ', g0.2)", mean
end program
However, when running the two tasks at the same time, task B will modify the array that task A is reading, leading to incorrect values. On CPU (no acceleration) I get:
2.0 4.0 6.0 8.0 10. 12. 14. 16. 18. 20.
mean = 500000.5000000000
On GPU (using the NVIDIA HPC SDK), I get a different mean which is obviously incorrect:
2.0 4.0 6.0 8.0 10. 12. 14. 16. 18. 20.
mean = 999967.6836640000
Is there an elegant way to "protect" the array being worked by task A?