I'm attempting to apply a matrix multiply to a batch of matrices by using a compute shader. After troubleshooting various things, I found what was causing bad output.
#version 430
layout(local_size_x = 4, local_size_y = 4, local_size_z = 16) in;
layout(binding = 0) uniform viewLayout { mat4 eye; };
layout(std430, binding = 1) buffer inputLayout { mat4 src[]; };
layout(std430, binding = 2) buffer outputLayout { mat4 dst[]; };
void main(void)
{
uint itm = (gl_WorkGroupID.z << 4) + gl_LocalInvocationID.z;
uint col = gl_LocalInvocationID.y;
uint row = gl_LocalInvocationID.x;
dst[itm][col][row] = row;
}
The variable dst (in bold) produces the following output for the first 16 floats
[0] | 0.000000000 | float |
[1] | 1.00000000 | float |
[2] | 2.00000000 | float |
[3] | 3.00000000 | float |
[4] | 0.000000000 | float |
[5] | 0.000000000 | float |
[6] | 1.00000000 | float |
[7] | 2.00000000 | float |
[8] | 3.00000000 | float |
[9] | 0.000000000 | float |
[10] | 0.000000000 | float |
[11] | 1.00000000 | float |
[12] | 2.00000000 | float |
[13] | 3.00000000 | float |
[14] | 0.000000000 | float |
[15] | 0.000000000 | float |
However, if I change it to vec4 dst[][4],
[0] | 0.000000000 | float |
[1] | 1.00000000 | float |
[2] | 2.00000000 | float |
[3] | 3.00000000 | float |
[4] | 0.000000000 | float |
[5] | 1.00000000 | float |
[6] | 2.00000000 | float |
[7] | 3.00000000 | float |
[8] | 0.000000000 | float |
[9] | 1.00000000 | float |
[10] | 2.00000000 | float |
[11] | 3.00000000 | float |
[12] | 0.000000000 | float |
[13] | 1.00000000 | float |
[14] | 2.00000000 | float |
[15] | 3.00000000 | float |
Can anyone help me understand why mat4 is treated like it is 5 floats wide instead of 4?
Thank you for the report. Can you provide an app which reproduces the issue?
I've passed this to the team for further investigation. Will get back to you as soon as I hear back.
Many thanks for your patience! The issue has been resolved. It should land in one of the upcoming driver releases.
Great. Will test next driver update.