Archives Discussions

I have one piece of compute shader code, like this:

uint g_maxNumber = 1000;

groupshared uint g_global index = 0;

struct TestInfo

{

m_offset;

m_value;

};

StructuredBuffer <TestInfo> testBuffer : register(t0)

[numThreads(128, 1, 1)]

void function_a ()

{

uint currentIdx;

InterlockedAdd(g_globalIndex, 1, currentIdx);

// GroupMemoryBarrierWithGroupSync():

TestInfo curInfo = testBuffer[currentIdx];

[loop]

while (currentIdx < g_maxNumber)

{

.....

InterlockedAdd(g_globalIndex, 1, currentIdx);

}

}

this piece of code doesn't get the correct "currentIdx" value, but it works fine when I add groupmemorybarrier after inerlockedAdd.

Add barrier will force all thread sync, that will slow down the whole process. Is a bug with amd card or something I am missing?

this piece of code works fine for nvidia gtx570 card, the amd card I have is 7970.

1 Reply

Sorry for the tardy reply. Thank you for reporting this issue. I've forwarded this thread to a member of the AMD support team.

I'll let you know what I hear from them.

Cheers!

Kristen