I have one piece of compute shader code, like this:
uint g_maxNumber = 1000;
groupshared uint g_global index = 0;
struct TestInfo
{
m_offset;
m_value;
};
StructuredBuffer <TestInfo> testBuffer : register(t0)
[numThreads(128, 1, 1)]
void function_a ()
{
uint currentIdx;
InterlockedAdd(g_globalIndex, 1, currentIdx);
// GroupMemoryBarrierWithGroupSync():
TestInfo curInfo = testBuffer[currentIdx];
[loop]
while (currentIdx < g_maxNumber)
{
.....
InterlockedAdd(g_globalIndex, 1, currentIdx);
}
}
this piece of code doesn't get the correct "currentIdx" value, but it works fine when I add groupmemorybarrier after inerlockedAdd.
Add barrier will force all thread sync, that will slow down the whole process. Is a bug with amd card or something I am missing?
this piece of code works fine for nvidia gtx570 card, the amd card I have is 7970.