I am trying to use SVM for data sharing between CPU and GPU. However I have a question about CL_MEM_SVM_ATOMICS flag
I wrote a code where GPU writes some data in SVM buffer and CPU thread reads it. However every time I read it in CPU, I receive few 0's. If I add a delay of 1ns then all the results were correct. So I assume it is something related to memory consistency i.e. opencl does not guarantee when data will be available to CPU until kernel terminates. Interesting part is if I use CL_MEM_SVM_ATOMICS then even without a delay all the results were correct. Though I am not using any atomic operation anywhere. Can someone please let me understand why just using a flag CL_MEM_SVM_ATOMICS changes the results? What exactly happens differently in memory when I use CL_MEM_SVM_ATOMICS without using any atomic operation? I could not find this answer anywhere.
Thank you for sharing this interesting observation. I don't have any explanation at this moment. As I know, the above behavior is not guaranteed as per the OpenCL standard. I've already shared your query with our engg. team. I'll come back once I've their reply.
Meanwhile, please share a repro and the setup information where you've observed the behavior.
Sorry for responding so late. I am trying to recreate it in a sample program so that I can post it here. It seems like it is not easy to recreate. However I will paste the snippet of the program that is actually producing such result.
both testin and testout are created using CL_MEM_SVM_FINE_GRAIN_BUFFER.
b1,b2,b3,b4 are private buffers created and computed per thread.
Even if I assume my computation is wrong from above code value of x should always be 100 at the cpu output. But thats not the case, it prints 0 sometimes. I am using persistant kernel that I terminates after output is read by cpu for all the input data. At one go only 512 data inputs will be processed.