1 Reply Latest reply on Jul 24, 2013 2:35 PM by piranha

    GL4.3 Storage Buffer bug?

    piranha

      Hello,

       

      We using compute shaders to calculate fractal noise. The problem is, that we can't compute larger inputs than 4096 vectors. After we exceed this line, the shader returns the same value for all remainders, or skips it.

      We made a video showing this on a visual output:

      (external link: amd compute shader bug - YouTube)

       

      As you can see, if the resolution exceeds 64x64 (4096) it breaks.

      When we have no input buffer (means just creating random test values in shader, and putting them into the output buffer), we don't get these problems at all. So I'm pretty sure its input buffer bug.

       

      Here's how we run it from input to the finished result:

              public float[] GetValues(Vector4[] input) { // Takes Vec4 1D array as input. Finished noise is 1D float array as output
                  GL.UseProgram(program);
                  // Generate Input Buffers
                  int inBuffer = GL.GenBuffer(); // First buffer contains the vec4 data and is our "problem child"
                  GL.BindBuffer(BufferTarget.ArrayBuffer, inBuffer); // No difference using ArrayBuffer or ShaderStorageBuffer
                  GL.BufferData(BufferTarget.ArrayBuffer, new IntPtr(Vector4.SizeInBytes * input.Length), input, BufferUsageHint.StaticDraw);
                  GL.BindBufferBase(BufferTarget.ShaderStorageBuffer, 0, inBuffer); // Bind buffer to shader location 0
      
                  int inPermBuffer = GL.GenBuffer(); // Second input is permutation data, its size is only a kb and makes no problems 
                  GL.BindBuffer(BufferTarget.ArrayBuffer, inPermBuffer);
                  GL.BufferData(BufferTarget.ArrayBuffer, new IntPtr(sizeof(int) * permutation.Length), permutation, BufferUsageHint.StaticDraw);
                  GL.BindBufferBase(BufferTarget.ShaderStorageBuffer, 1, inPermBuffer); // Bind buffer to shader location 1
      
                  //Generate Ouput Buffer
                  float[] result = new float[input.Length];
                  int outBuffer = GL.GenBuffer(); // The buffer which contains the result
                  GL.BindBuffer(BufferTarget.ArrayBuffer, outBuffer);
                  GL.BufferData(BufferTarget.ArrayBuffer, new IntPtr(sizeof(float) * input.Length), result, BufferUsageHint.StaticCopy);
                  GL.BindBufferBase(BufferTarget.ShaderStorageBuffer, 2, outBuffer); // Bind buffer to shader location 3 
                 
                  // Start compute
                  GL.DispatchCompute((int)Math.Ceiling(input.Length / 256.0), 1, 1);
                  GL.MemoryBarrier(MemoryBarrierMask.ShaderStorageBarrierBit);
                  // Getting pointer to result data
                  IntPtr outBufferPointer = GL.MapBuffer(BufferTarget.ShaderStorageBuffer, BufferAccess.ReadOnly);
                  // Copy the result to our managed result variable;
                  Marshal.Copy(outBufferPointer, result, 0, input.Length);
                  // Exiting buffer access
                  GL.UnmapBuffer(BufferTarget.ShaderStorageBuffer);
                  // Clean up
                  GL.DeleteBuffer(inBuffer);
                  GL.DeleteBuffer(inPermBuffer);
                  GL.DeleteBuffer(outBuffer);
      
                  return result;
              }
      

       

      It's C# code, but it easy to read for c++ guys anyway

       

      Here is how we implemented our buffers in GLSL and  how we acces them in the main function:

       

       

      #version 430 core
      
      struct vertex
      {
        vec4 pos;
      };
      
      layout(std430,binding = 0) readonly buffer iBuffer
      {
        vertex Vectors[];
      };
      
      layout(std430,binding = 1) readonly buffer pBuffer
      
      {
        int Permutation[];
      };
      
      layout(std430,binding = 2) writeonly buffer oBuffer
      {
        float Output[];
      };
      
      layout (local_size_x = 256) in;
      
      void main() { 
          vec4 in_pos = Vectors[gl_GlobalInvocationID.x].pos;      
          Output[gl_GlobalInvocationID.x] = /*Tons of instructions using in_pos here*/  ;
      }
      

       

      We are experimenting for 2 days now to get this working on AMD cards (NV -> no problems at all).

      Are we using the storage shaders wrong, or is this really a memory bug on AMD cards ?

       

      Hardware: 7970

      Driver: Tested with latest stable and latest beta driver.

       

      I appreciate all comments