Hi folks,
I would like to bring the racing situation happened in memory access in OpenCL here to discuss. As ATI/Nvidia tech docs specify that when it has multiple work items try to access the same memory space, the memory controller will queue them. However based on my experience the access queue only help in read but not in write. In my OpenCL apps, when multiple work items try to write (update) the content of the same memory space the racing condition happened and cause to wrong results. Currently I am running on 48xx series.
So has anyone faced the same problem? and is there and solution to this problem in general and/or is there any improvement in memory controller architecture in 58xx and fermi series to avoid the problem?
Thanks,
Roto
Is it possible to use a software solution? I haven't looked into this at all but I'm curious.
I'm not sure I understand the question. What kind of race condition are you getting and how would you imagine it be fixed? Even if, as Micah suggests, you serialize writes then you still have no guarantee that reads from another work group will be serialized with them the way you expect so you'll always expect unpredictable read/write ordering. At best you could guarantee the order within the reads and writes for a given wavefront - and in OpenCL you can't completely because of the memory consistency model. You can insert fences to give some level of control, though.
well and what about this. i need check error in kernel so i create error buffer.
can i do this?
if(is error happend)err[0] = 1;
i mean it will be there that 1 for sure? on 5xxx card i can use atomics but what about 4xxx. will it work reliable?
nou, your code is fine.
On the other hand if your code said this:
if(is error happend)err[0] = 1; else err[0] = 0;
Then the output is undefined unless "is error happend" is the same across every work item on the device.
Originally posted by: LeeHowes nou, your code is fine.
On the other hand if your code said this:
if(is error happend)err[0] = 1; else err[0] = 0;
Then the output is undefined unless "is error happend" is the same across every work item on the device.
Hi Lee,
Can you help me explain with more details why the result is undefined unless "is error happend"? What I think is if the "is error not happend" then it will jump to the else branch which will yield err[0]=0.
You are right, of course. It will jump to the else branch.
What I mean is not that the result is undefined unless "is error happened" is true. I mean that if "is error happened" is true for one thread, and not true for another thread then in what order do those threads write to err[0]? If the successful one writes a 0 first, and the one with an error writes a 1 afterwards then you end up with a 1. What if the 0 write happened after the 1? Then you'd think there'd been no error because err[0] would be 0 even though one of the threads recorded an error. You could only guarantee the answer if "is error happened" was either true for every thread or false for every thread.
This is true whether you have the 4 or so threads on a multicore CPU or the 40 or so threads you're likely to have running on a 5870. You might be lucky that each lane of a SIMD does have a predictable ordering through the memory controller, I don't know if that's the case, but between wavefronts you definitely won't have a predictable ordering so there would be no way to know whether the last write would be a 1 or a 0.
Thanks all for your answer. I'm sorry for the late reply, just back in town after a long trip.
I think Micah is true, if racing in writing happen no matter who wins but we got an unexpected result. So based on my experience now we just have to reorganize our access scheme so that there now overlap in be-written memory space.
I talked to several people and it's likely there no hardware or software solution to handle memory write racing recently.
Thanks,
Roto