AnsweredAssumed Answered

Strange compute shader behavior[solved; it was a goof on my part]

Question asked by kknewkles on Apr 10, 2020
Latest reply on Apr 18, 2020 by kknewkles

I was writing a scan over an array of numbers with the use of subgroup operations(targeting Vk 1.2 and SPIR-V 1.5) and noticed something I can only classify as "extremely suspicious".

#version 460

#extension GL_KHR_shader_subgroup_arithmetic: enable

layout (local_size_x = 64, local_size_y = 1, local_size_z = 1) in;
layout (push_constant) uniform pc
{
    uint thread_count;
};

layout (std430, set = 0, binding = 0) coherent buffer _data { uint data[]; };
layout (std430, set = 0, binding = 1) coherent buffer _sum  { uint sum[]; };

void main()
{
    uint global = gl_GlobalInvocationID.x;
    if(global >= thread_count)
        return;
    uint local = gl_LocalInvocationID.x;
    uint block = gl_WorkGroupID.x;
   
    // blockwise sum
    sum[block] = subgroupAdd(data[global]);
    // individual scans
    data[global] = subgroupExclusiveAdd(data[global]);
   
    memoryBarrier();
   
    // blockwise scan(excess writes)
    if(block == 4)
    {
        sum[local] = subgroupExclusiveAdd(sum[local]);
    }
   
    memoryBarrier();
   
    // global scan
    data[global] += sum[block];
}

Inputs:

- "data" array: 4096 ints, each equals 1.

- "sum" array: 64 ints worth of space, for calculations.

Expected outputs:

- "sum" array: each element should be 64 multiplied by the element's index, so multiples of 64 going from 0 to 4032; works as expected

- "data" array: each element of the array should be equal exactly to its index, going from 0 to 4095. Does not work quite as expected.

 

Shader objective: data-wide scan

Shader logic: in each of the wavefronts, we run an exclusive scan, result being 0..63 in each wavefront. To modify each wavefront to a correct value, we also take sum of every wavefront(64), write it to "sum" array(64 values equal to 64 each), and run a scan over that(0,64,...4032), to get the values that should be added to each of the wavefronts to bump them up to desired values and thus get our datawide scan.
And here's where the interesting stuff happens.

 

It seemed sensible to me to run the scan over the "sum" array in a single block, and I put the appropriate line of code inside an if block, tasking the very first block with this calculation(in the shader code you see the fourth block).

And here's the crux of the whole thing: only the block that you specify will have expected values; every other block will output 64..127, ignoring the last line of code. If the if block is removed, resulting outputs are as expected, "data" array being 0..4095.

 

I'm not ready to blame any compiler or driver involved, but I have a very hard time making sense of this situation. It's unobvious to me how this shader code could be incorrect.

 

 

 

My Vulkan SDK version: 1.2.131.2

My GPU: Radeon Asus RX 470

My AMD drivers version: 20.4.1

Git repo link: GitHub - Kknewkles/suspicious_shader: Shader behavior investigation 

Shader playground link, for shaders' assembly: Shader Playground 

 

I've tried to set this whole thing up as helpful and easy to use and understand as I could, the git repo has my (somewhat) minimal Vulkan setup for a compute shader execution and a Visual Studio solution file that will get you straight to the breakpoint and arrays warranting inspection.
If you need more details, please ask and I'll do my best to be of further help.

Outcomes