Archives Discussions

yours3lf · ‎05-13-2013

Hi there,

I've been trying to get OpenGL compute shaders to work using the new driver.

I got mixed results: while I could run an application that writes to a Texture Buffer Object, I get a driver crash/restart when I try to run an application that writes to a simple Texture. I didn't find any clue regarding this in the OpenGL specifications, so this should probably work fine.

here's the app:
https://docs.google.com/file/d/0B33Sh832pOdOaWFlVS00N040bFE/edit?usp=sharing

Putting a glFinish() after calling glDispatchCompute solved the driver crash, but I still don't get anything on screen. I can render the texture fine when not using compute shaders.

I suspect this might be a syncronization issue in the driver, meaning the compute shader tries to write to the texture, and the next shader tries to read from it at the same time.

Please take a look at this issue.

Best regards,

Yours3lf

sc4v · ‎05-22-2013

Same issue here

Using a HD 5850 with 13.4

dutta · ‎06-19-2013

You never bind anything to the texture in the Compute Shader. The texture 'texture0' is not attached to any texture object. First, as you would with textures, you get the uniform location. Then, you must call glBindImageTexture on the texture. Example code:


glUniform1i(glGetUniformLocation(program, 'texture0'), 0);
glBindImageTexture(0, texture, 0, GL_FALSE, 0, GL_READ_WRITE, GL_RGBA8);

The glUniform1i binds the variable to an arbitrary value. Then, you bind the texture to that value. In the texture case, we would use glActiveTexture() and then glBindTexture(), but since images are different, we use glBindImageTexture() instead. This should solve your crash .
I know this because I myself had the same problem with the same example you are working on. Also, instead of using glFinish(), I would assume using glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT); does the same thing in your case, since you are writing directly to an image and nothing else. The glMemoryBarrier function ensures that execution for a specific feature set is done before continuing, and since you are using shader images, the GL_SHADER_IMAGE_ACCESS_BARRIER_BIT should suffice.

Hope this solves the problem .

yours3lf · ‎06-20-2013

hey there,

thank you for the reply!

shortly after writing this question I found this glBindImage() function, but it still doesn't work with it. Neither with the corrections you suggested.

I also found the barrier function, but it doesn't solve anything.

my hardware specs:
AMD A8-4500m APU

Can you please take a look at it again? Here's the update project:

opengl_compute_shader.7z - Google Drive

dutta · ‎06-20-2013

Weird. You can also try to define the local_size_z in the shader to 1. I'm currently developing a middle-ware compiler which accepts another language and generates OpenGL code from it, and I managed to get this to work, however, I also got driver crashes whenever I tried using glDispatchCompute() where the dimensions where bigger than those defined in the shader. I don't know if the specification says if a local size is left undefined. So my compiler simply sets all local_size_x/y/z to 1 if they are not defined in the middle-ware code. So just try changing:


layout(local_size_x = 16, local_size_y = 16) in; //local workgroup size

To:


layout(local_size_x = 16, local_size_y = 16, local_size_z = 1) in; //local workgroup size

You might also want to bind the texture uniform in your geometry rendering shader. I can see you are doing:


glActiveTexture(GL_TEXTURE0);

glBindTexture(GL_TEXTURE_2D, the_texture);

But never:


glUniform1i(glGetUniformLocation(debug_shader, "texture0"), 0);

You should do something like:


glUseProgram(debug_shader);

glUniform1i(glGetUniformLocation(debug_shader, "texture0"), 0);

glActiveTexture(GL_TEXTURE0);

glBindTexture(GL_TEXTURE_2D, the_texture);

yours3lf · ‎06-21-2013

hey there,

I set the local size, and the uniform location, but still nothing. And it still crashes :S
By the way, the reason I'm not doing any glUniform1i(glGetUniformlocation(...)...) is because the locations were set in the shaders using layout qualifiers.

layout(binding=loc) uniform sampler2D/image2D texture0;

which enables me to only say:

glActiveTexture(GL_TEXTURE0 + loc)

and no need to pass the location via a uniform.

so essentially

loc should be the same:

currently I'm doing this:


//fill the texture with the compute shader output

    glUseProgram(compute_shader);


    glUniform1f(1, float(frames) * 0.01f);


    glUniform1i(glGetUniformLocation(compute_shader, "texture0"), 0);  

    glBindImageTexture(glGetUniformLocation(compute_shader, "texture0"), the_texture, 0, GL_FALSE, 0, GL_READ_WRITE, GL_RGBA8); 

    //glBindImageTexture(0, the_texture, 0, GL_FALSE, 0, GL_WRITE_ONLY, GL_RGBA8);


    glDispatchCompute(screen.x / 16, screen.y / 16, 1);


    glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT);

    glFinish(); //still needed :S


    get_opengl_error();


    //display the texture on screen

    glUseProgram(debug_shader);


    mvm.push_matrix(cam);

    glUniformMatrix4fv(0, 1, false, &ppl.get_model_view_projection_matrix(cam)[0][0]);

    mvm.pop_matrix();


    glUniform1i(glGetUniformLocation(debug_shader, "texture0"), 0);  

    glActiveTexture(GL_TEXTURE0);

    glBindTexture(GL_TEXTURE_2D, the_texture);


    glBindVertexArray(quad);

    glDrawElements( GL_TRIANGLES, 6, GL_UNSIGNED_INT, 0 );

dutta · ‎06-21-2013

Does your get_opengl_error() give you any errors? If so, which ones? In any case, the AMD drivers seems to be a bit broken when doing compute shading. For example. glBindImageTexture can throw GL_INVALID_OPERATION, something which isn't even in the GL specification. Also, glDispatchCompute() nukes rendering for me, so if I don't call glClear() after I do glDispatchCompute(), rendering stops working completely. I even tried the same thing on Linux, and whenever I called glDispatchCompute() followed by a glDraw* without a glClear() inbetween, the glDraw* had no effect, even if I conditionally removed the glDispatchCompute() part (so that I only run glDispatchCompute() once). And no, I got no errors anyway either, so GL doesn't report any errors.

In other words, I cannot help you. The above code is more or less exactly the same I have, and it barely works for me, although it doesn't cause a driver crash. However, I'm using an HD5970, so the hardware differs, and that might be the reason.

So maybe some AMD representative should take a look at this to check if it might be a driver issue. It would be really helpful.

gsellers · ‎06-21-2013

Hi,

I have brought this to the attention of our compiler team. We'll investigate and see if we can't fix it.

Thanks,

Graham

yours3lf · ‎06-22-2013

Thank you Graham for dealing with the issue!

yours3lf · ‎07-24-2013

Hi there,

I've just tried this driver, the issue seems to be fixed now.
AMD Catalyst OpenGL 4.3 Graphics Driver, 7 new OpenGL Extensions - 3D Tech News and Pixel Hacking - ...

A couple of things:

-in the compute shader layout(location=...) is needed to specify an image2D binding point (glBindImageTexture), and not layout(binding=...)

-it seems like the memory barrier is not needed at all after the compute shader call (so the driver is properly figuring it out, maybe?). I suppose if I wanted to run another compute shader that writes the same memory, then I'd have to place barriers.

Thank you for fixing this!

best regards,
Yours3lf

maizensh · ‎08-07-2013

That's strange, the spec explicitly says to use "binding" and not "location" in 4.4.6.2 (which points to 4.4.5).

Also, GL_ARB_shader_image_load_store specification tells:

- Data written to image variables in one rendering pass and read by the

shader in a later pass need not use coherent variables or

memoryBarrier(). Calling MemoryBarrier() with the

SHADER_IMAGE_ACCESS_BARRIER_BIT set in <barriers> between passes is

necessary.

As I understand it, if you continue using UAVs then you must call MemoryBarrier, if you're accessing the data in a different manner then you don't need to sync.

Please correct me if I'm wrong.

yours3lf · ‎08-09-2013

Well I used imagestore in the compute pass, but a simple texture() in the displaying pass, so I guess no barriers are required, but I'll try if it crashes with imageload.
As for the binding vs location, the driver seems really beta/alpha version so anything is possible. Hopefully the new khronos conformance tests will eliminate these, but that will be only gl4.4+

maizensh · ‎08-09-2013

I tried it out on NVIDIA hardware (imagestore and a texture sample afterwards) and it worked without the barrier. But be warned, it's different hardware and in my pipeline there are multiple stages, so it takes a little time between the compute stage and the draw stage. By "little" I mean something like 2-3ms.

yours3lf · ‎06-22-2013

no errors reported. Otherwise I would have mentioned it.

This is hilarious.

Thank you for the help!

Archives Discussions

OpenGL Compute Shader 13.4 Driver Crash/Restart Win7 64bit