I would like to bring the racing situation happened in memory access in OpenCL here to discuss. As ATI/Nvidia tech docs specify that when it has multiple work items try to access the same memory space, the memory controller will queue them. However based on my experience the access queue only help in read but not in write. In my OpenCL apps, when multiple work items try to write (update) the content of the same memory space the racing condition happened and cause to wrong results. Currently I am running on 48xx series.
So has anyone faced the same problem? and is there and solution to this problem in general and/or is there any improvement in memory controller architecture in 58xx and fermi series to avoid the problem?