cancel
Showing results for 
Search instead for 
Did you mean: 

OpenGL & Vulkan

kknewkles
Adept I

Strange compute shader behavior[solved; it was a goof on my part]

I was writing a scan over an array of numbers with the use of subgroup operations(targeting Vk 1.2 and SPIR-V 1.5) and noticed something I can only classify as "extremely suspicious".

#version 460

#extension GL_KHR_shader_subgroup_arithmetic: enable

layout (local_size_x = 64, local_size_y = 1, local_size_z = 1) in;
layout (push_constant) uniform pc
{
uint thread_count;
};

layout (std430, set = 0, binding = 0) coherent buffer _data { uint data[]; };
layout (std430, set = 0, binding = 1) coherent buffer _sum { uint sum[]; };

void main()
{
uint global = gl_GlobalInvocationID.x;
if(global >= thread_count)
return;
uint local = gl_LocalInvocationID.x;
uint block = gl_WorkGroupID.x;

// blockwise sum
sum[block] = subgroupAdd(data[global]);
// individual scans
data[global] = subgroupExclusiveAdd(data[global]);

memoryBarrier();

// blockwise scan(excess writes)
if(block == 4)
{
sum[local] = subgroupExclusiveAdd(sum[local]);
}

memoryBarrier();

// global scan
data[global] += sum[block];
}‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Inputs:

- "data" array: 4096 ints, each equals 1.

- "sum" array: 64 ints worth of space, for calculations.

Expected outputs:

- "sum" array: each element should be 64 multiplied by the element's index, so multiples of 64 going from 0 to 4032; works as expected

- "data" array: each element of the array should be equal exactly to its index, going from 0 to 4095. Does not work quite as expected.

Shader objective: data-wide scan

Shader logic: in each of the wavefronts, we run an exclusive scan, result being 0..63 in each wavefront. To modify each wavefront to a correct value, we also take sum of every wavefront(64), write it to "sum" array(64 values equal to 64 each), and run a scan over that(0,64,...4032), to get the values that should be added to each of the wavefronts to bump them up to desired values and thus get our datawide scan.
And here's where the interesting stuff happens.

It seemed sensible to me to run the scan over the "sum" array in a single block, and I put the appropriate line of code inside an if block, tasking the very first block with this calculation(in the shader code you see the fourth block).

And here's the crux of the whole thing: only the block that you specify will have expected values; every other block will output 64..127, ignoring the last line of code. If the if block is removed, resulting outputs are as expected, "data" array being 0..4095.

I'm not ready to blame any compiler or driver involved, but I have a very hard time making sense of this situation. It's unobvious to me how this shader code could be incorrect.

My Vulkan SDK version: 1.2.131.2

My GPU: Radeon Asus RX 470

My AMD drivers version: 20.4.1

Git repo link: GitHub - Kknewkles/suspicious_shader: Shader behavior investigation 

Shader playground link, for shaders' assembly: Shader Playground 

I've tried to set this whole thing up as helpful and easy to use and understand as I could, the git repo has my (somewhat) minimal Vulkan setup for a compute shader execution and a Visual Studio solution file that will get you straight to the breakpoint and arrays warranting inspection.
If you need more details, please ask and I'll do my best to be of further help.

0 Likes
1 Solution

Thanks for your reply, but it has nothing to do whatsoever with my problem.

Which was, I should've broken down the shader in two at the spot where I needed synchronization of all threads.

Due to lack of effective sync, the other blocks simply outran the designated one, leading to dfiferent data.

Basic stuff(that I somehow managed to forget), pretty disappointed nobody pointed that out in almost a week since I've posted the question.

View solution in original post

0 Likes
2 Replies
danilw
Adept II

maybe you have same bug that I report already(also arrays in shader)

AMD Vulkan driver SPIR-V shader Much more critical bug 

or this post on khronos forum

in my case minimal code was 

float main2(in vec2 fragCoord)
{
fragCoord=fragCoord/iResolution.xy;
fragCoord=floor(vec2(256.,2.)*fragCoord)+0.5;
ivec2 ipx=ivec2(fragCoord);
if(ipx.x<0)return 0.; //just to be "sure"
if(ipx.x<128)
if((bits[ipx.x%2])==0x77777777u)return 1.;else return 0.; //bug
else
if(((ipx.x%2==0)?bits[0]:bits[1])==0x77777777u)return 1.;else return 0.; //no bug
return 0.;
}‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

and from this case I understand that AMD SPIR-V do not allow to use "not shader-build-in constant as array index"

and way to fix it-make static switcher,like I did for my bugged array:

uint get_by_index(int idx){
uint ret=0;
switch(idx){
case 0:ret=data[0];break;
case 1:ret=data[1];break;
case 2:ret=data[2];break;
....etc....
case 30:ret=data[30];break;
case 31:ret=data[31];break;
}
return ret;
}‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

and use this function `get_by_index()` in your code everywhere, instead of `data[global]` will be `get_by_index(global)`

this fix my code in that link on AMD

I see that you have 4k array size, it may be bad to create thislarge switch for every used array...

from my experience with AMD Vulkan I saw that getting arrays from texture(`texelFetch()`) work correct, when uniform/push_const does not

so this is option 2 is - send you array as Sampler2D and get it by `texelFetch()` in shader

summary, ways to fix it:

1. GLSL(not SPIRV) shaders in OpenGL in AMD do work correct, when SPIRV shaders in Vulkan are broken.(not just this bug, lots of bugs, I say every non linear logic shader in AMD Vulkan may trigger UB randomly, too many bugs) (check my other bugs, my 2k+ lines of code shaders do not work in AMD or trigger UB in process(when in OpenGL they do work on same AMD))

2. this is only Windows bug, AMD Vulkan drivers in Linux do work correct with this shaders, and my bugs does not work on Linux.

3. use Nvidia, shaders do not have bugs there, I test my self everything

4. use listed above "tricks" that may fix this bug for you(and you maybe get stuck with other bug) that `switch` that return array static index, or `texelFetch`

0 Likes

Thanks for your reply, but it has nothing to do whatsoever with my problem.

Which was, I should've broken down the shader in two at the spot where I needed synchronization of all threads.

Due to lack of effective sync, the other blocks simply outran the designated one, leading to dfiferent data.

Basic stuff(that I somehow managed to forget), pretty disappointed nobody pointed that out in almost a week since I've posted the question.

0 Likes