BarnacleJunior

D3D11 cs_5_0 compiler bug - probably

Discussion created by BarnacleJunior on Jan 8, 2010
Latest reply on Jan 8, 2010 by BarnacleJunior

My simple little scan routine has been exhibiting a problem under D3D11, which it doesn't under OpenCL.  I do a very basic scan over an array of 1s.  Values 0-47 are ok (they read 0, 1, 2, 3, ec) , but 48-63 all read 47, instead of continuing this increment.  The inclusive scan is ok.  It's only when I subtract the thread's value from the inclusive scan to get the exclusive scan that this problem happens.  It's a bug in the D3D compiler's optimizer, because if I compile with /Od, it works fine.

 

#define WAVEFRONT 64 groupshared uint sharedSum[NUM_THREADS]; RWStructuredBuffer<uint> values : register(u0); void ThreadSum(uint tid, uint scansize) { uint lane = (WAVEFRONT - 1) & tid; uint laneMask = ~(WAVEFRONT - 1) & tid; [unroll] for(uint offset = 1; offset < scansize; offset<<= 1) { uint tid2 = ((lane - offset) & (WAVEFRONT - 1)) | laneMask; uint target = sharedSum[tid]; uint source = sharedSum[tid2]; bool cond = lane >= offset; target += cond ? source : 0; sharedSum[tid] = target; } } [numthreads(NUM_THREADS, 1, 1)] void Foo(uint tid : SV_GroupIndex, uint3 groupID : SV_GroupID) { uint gid = groupID.x; uint target = NUM_THREADS * gid + tid; uint val = values[target]; sharedSum[tid] = val; ThreadSum(tid, WAVEFRONT); uint inc = sharedSum[tid]; uint exc = inc - val; values[target] = exc; } fxc /T cs_5_0 /E Foo /D NUM_THREADS=64 /Fh Foo.h foo.hlsl Note that values[target] = inc; works. It's only when I subtract val from inc that it gets weird.

Outcomes