I wrote a code where GPU writes some data in SVM buffer and CPU thread reads it. However every time I read it in CPU, I receive few 0's. If I add a delay of 1ns then all the results were correct. So I assume it is something related to memory consistency i.e. opencl does not guarantee when data will be available to CPU until kernel terminates. Interesting part is if I use CL_MEM_SVM_ATOMICS then even without a delay all the results were correct. Though I am not using any atomic operation anywhere. Can someone please let me understand why just using a flag CL_MEM_SVM_ATOMICS changes the results? What exactly happens differently in memory when I use CL_MEM_SVM_ATOMICS without using any atomic operation? I could not find this answer anywhere.