HI,
I've recently encountered a bug in a GL compute shader on RDNA 1 and 2 GPUs. Please note this worked flawlessly on a R9 270X, and also on nVidia & Intel GPUs.
I can't share the original shader, but here is an extract of the problematic part:
void main()
{
bool prevStepOk = false;
vec4 values[8];
for([...])
{
//reuse previous step values if possible
if(prevStepOk)
{
values[0]=values[4];
values[1]=values[5];
values[2]=values[6];
values[3]=values[7];
}
else
{
values[0] = sampSomething();
values[1] = sampSomething();
values[2] = sampSomething();
values[3] = sampSomething();
}
values[4] = sampSomething();
values[5] = sampSomething();
values[6] = sampSomething();
values[7] = sampSomething();
if(something())
{
prevStepOk = true;
//do stuff, values array are accessed and never written
//....
}
else
{
prevStepOk = false;
}
}
}
As I said, this works on most GPUs except for RDNA ones. The final result looks like the array values are in a bad order.
If I change the values array size to 32 or more, everything is working fine again, but any value less than 32 is giving bad results.
If I change the code in such a way, everything is fine too:
void main()
{
bool prevStepOk = false;
vec4 values[8];
vec4 prevValues[4];
for([...])
{
if(prevStepOk)
{
values[0]=prevValues[0];
values[1]=prevValues[1];
values[2]=prevValues[2];
values[3]=prevValues[3];
}
else
{
values[0] = sampSomething();
values[1] = sampSomething();
values[2] = sampSomething();
values[3] = sampSomething();
}
values[4] = sampSomething();
values[5] = sampSomething();
values[6] = sampSomething();
values[7] = sampSomething();
prevValues[0]=values[4];
prevValues[1]=values[5];
prevValues[2]=values[6];
prevValues[3]=values[7];
if(something())
{
prevStepOk = true;
//do stuff, values array are accessed and never written
//....
}
else
{
prevStepOk = false;
}
}
}
As I've found a workaround this is not a big issue for me, but I wonder what may be the cause of the problem to ensure not reproduce such a behaviour in the future.
Thanks.
Hi @Crashy ,
Thanks for reporting it. I have whitelisted you and moved the post to the OpenGL forum.
Could you please share your setup details like OS, driver version, GPU etc. ?
Thanks.
Thanks.
OS: Windows 10
Driver version: 23.8.1 Adrenalin. Same issue with Pro version.
GPU: 6600 XT, also confirmed on 5600 XT
Thanks for the information. I will forward the issue to the OpenGL team.
An internal bug ticket has been created to track this issue. Will let you know once there is any update on this.
Thanks.
Hi @Crashy ,
Based on your description, the OpenGL team wrote a simple cs shader (see below) to reproduce the issue. However, they could not reproduce it.
It would be helpful if you please provide a complete reproducible example for this issue.
#version 430
layout (local_size_x = 16, local_size_y = 1) in;
layout (rgba32f, binding = 0) uniform image1D in_array;
layout (rgba32f, binding = 1) uniform image1D out_array;#define VALUE_SIZE 8
void main()
{
bool prevStepOk = false;
vec4 values[VALUE_SIZE];
int pos = int(gl_GlobalInvocationID.x) * VALUE_SIZE;for (int i = 0; i < 4; i++)
{
//reuse previous step values if possible
if (prevStepOk)
{
values[0] = values[4];
values[1] = values[5];
values[2] = values[6];
values[3] = values[7];
}
else
{
values[0] = imageLoad(in_array, pos+0);
values[1] = imageLoad(in_array, pos+1);
values[2] = imageLoad(in_array, pos+2);
values[3] = imageLoad(in_array, pos+3);
}
values[4] = imageLoad(in_array, pos+4);
values[5] = imageLoad(in_array, pos+5);
values[6] = imageLoad(in_array, pos+6);
values[7] = imageLoad(in_array, pos+7);if (prevStepOk == false)
{
prevStepOk = true;
}
else
{
prevStepOk = false;
}
}for(int i = 0;i < VALUE_SIZE; i++)
{
imageStore(out_array, pos, values[i]);
pos ++;
}
}
Thanks.
Thank you for your answer and for taking some time to try reproduce this issue. Somehow I was expecting that result.
I'm going to write a sample as simple as possible to reproduce the problem.
Hi,
I've created a sample program to reproduce this issue in raw open gl, however in this case everything is working as expected.
So I made a sample with my engine (a fork from Ogre) with everything superfluous stripped and in such case the problem is visible.
The samples are basically using a marching cube shader to generate a mesh. With the issue, the result is looking like in the next picture, with bad triangles.
When using one of the workarounds listed in the original post, the result is fine:
Here is an archive containing the two samples (the one in raw gl, and the one using Ogre)
The shaders are the same, I've included the source from the raw gl sample for reference.
In the ogre/testData/test_RDNA.comp file there is a preprocessor switch to enable/disable the "fix" named USE_SEPARATE_ARRAY .
Right now I'm doing some comparisons in RenderDoc to try see what are the differences between the two samples but can't find anything.
Hi @Crashy ,
Thanks for providing the reproducible example. The OpenGL team was able to reproduce the issue. As they have informed, it seems like a shader compiler optimization issue. The compiler team is currently investigating it. Will let you know once there is any update on this.
Thanks.
Update:
The complier team has implemented a fix for this compiler optimization issue.
Thanks.
Hi @Crashy ,
Could you please try the latest Adrenalin 23.12.1 and let us know if the above issue has been resolved?
Thanks.
Hi @dipak
Unfortunately the result is still the same with 23.12.1
Note that I reset the shader chache to ensure everything is up to date.
Thanks for sharing the above observation. I have informed the OpenGL team.
Thanks.