Although the ATI OpenCL manual says that LDS is not supported for R7xx family, I seem to be writing ocl code whice uses LDS & local threads, plus barriers, and seems to be giving the correct results, using the HD4870 X2?
It is not complaining during compile or run time. Probably the performance is not ideal, but is it some how using the smaller LDS or is using registers?
Thanks
As far as I know, LDS is not physically supported on 7xx, meaning that all the LDS is actually done in global memory.
What is your GPR on 770? 870? You can use the SKA for this.
On R7xx local memory is in global memory, yes. Physical LDS is owner-writes which is incompatible with the OpenCL 1.0 specification for local memory.
This means I am ONLY loosing performance, but as far as using sync barriers & data locality, the results are accurate?
That should be true, yes.
thanks Lee & Ryta123