As stated in the 'Topic Summary'; when .Appending in a branch that depends on a value previously .Consume()-d from another buffer, branch will be incorrectly taken in some cases.
For example,
[numthreads( PROCESS_NUM_THREADS, 1, 1 )]
void ProcessTexturePass1( uint3 Gid : SV_GroupID, uint GI : SV_GroupIndex )
{
uint index = Gid.x * PROCESS_NUM_THREADS + GI;
if( index >= g_rawU32Buffer[0] )
return;
uint value = g_candidatePixelsListRead.Consume();
int2 screenPos = int2( (value >> 16) & 0xFFFF, value & 0xFFFF );
float3 pixelLeft = g_inputTexture.Load( int3( screenPos.x+1, screenPos.y, 0 ) ).rgb;
float3 pixelRight = g_inputTexture.Load( int3( screenPos.x-1, screenPos.y, 0 ) ).rgb;
if( (pixelLeft.b > 0.999) && (pixelRight.b > 0.999) )
{
g_finalPixelsListWrite.Append( (screenPos.x << 16) | screenPos.y );
}
}
one 'true' branch can cause ~64 'neighbouring' threads to .Append even if their branch evaluates to 'false', Appending additional undefined data to the buffer.
REF device behaves correctly as well as NVidia DirectX11 hardware.
This was observed on latest 10.12 and 10.11 drivers, on 5550, 5970 and 6970.
I've created a simple repro project here:
http://www.vertexasylum.com/downloads/AppendStructuredBufferBranchingBug.zip