Hi there,
I've been trying to get OpenGL compute shaders to work using the new driver.
I got mixed results: while I could run an application that writes to a Texture Buffer Object, I get a driver crash/restart when I try to run an application that writes to a simple Texture. I didn't find any clue regarding this in the OpenGL specifications, so this should probably work fine.
here's the app:
https://docs.google.com/file/d/0B33Sh832pOdOaWFlVS00N040bFE/edit?usp=sharing
Putting a glFinish() after calling glDispatchCompute solved the driver crash, but I still don't get anything on screen. I can render the texture fine when not using compute shaders.
I suspect this might be a syncronization issue in the driver, meaning the compute shader tries to write to the texture, and the next shader tries to read from it at the same time.
Please take a look at this issue.
Best regards,
Yours3lf
Same issue here
Using a HD 5850 with 13.4
You never bind anything to the texture in the Compute Shader. The texture 'texture0' is not attached to any texture object. First, as you would with textures, you get the uniform location. Then, you must call glBindImageTexture on the texture. Example code:
glUniform1i(glGetUniformLocation(program, 'texture0'), 0);
glBindImageTexture(0, texture, 0, GL_FALSE, 0, GL_READ_WRITE, GL_RGBA8);
The glUniform1i binds the variable to an arbitrary value. Then, you bind the texture to that value. In the texture case, we would use glActiveTexture() and then glBindTexture(), but since images are different, we use glBindImageTexture() instead. This should solve your crash .
I know this because I myself had the same problem with the same example you are working on. Also, instead of using glFinish(), I would assume using glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT); does the same thing in your case, since you are writing directly to an image and nothing else. The glMemoryBarrier function ensures that execution for a specific feature set is done before continuing, and since you are using shader images, the GL_SHADER_IMAGE_ACCESS_BARRIER_BIT should suffice.
Hope this solves the problem .
hey there,
thank you for the reply!
shortly after writing this question I found this glBindImage() function, but it still doesn't work with it. Neither with the corrections you suggested.
I also found the barrier function, but it doesn't solve anything.
my hardware specs:
AMD A8-4500m APU
Can you please take a look at it again? Here's the update project:
Weird. You can also try to define the local_size_z in the shader to 1. I'm currently developing a middle-ware compiler which accepts another language and generates OpenGL code from it, and I managed to get this to work, however, I also got driver crashes whenever I tried using glDispatchCompute() where the dimensions where bigger than those defined in the shader. I don't know if the specification says if a local size is left undefined. So my compiler simply sets all local_size_x/y/z to 1 if they are not defined in the middle-ware code. So just try changing:
layout(local_size_x = 16, local_size_y = 16) in; //local workgroup size
To:
layout(local_size_x = 16, local_size_y = 16, local_size_z = 1) in; //local workgroup size
You might also want to bind the texture uniform in your geometry rendering shader. I can see you are doing:
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, the_texture);
But never:
glUniform1i(glGetUniformLocation(debug_shader, "texture0"), 0);
You should do something like:
glUseProgram(debug_shader);
glUniform1i(glGetUniformLocation(debug_shader, "texture0"), 0);
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, the_texture);
hey there,
I set the local size, and the uniform location, but still nothing. And it still crashes :S
By the way, the reason I'm not doing any glUniform1i(glGetUniformlocation(...)...) is because the locations were set in the shaders using layout qualifiers.
layout(binding=loc) uniform sampler2D/image2D texture0;
which enables me to only say:
glActiveTexture(GL_TEXTURE0 + loc)
and no need to pass the location via a uniform.
so essentially
loc should be the same:
currently I'm doing this:
//fill the texture with the compute shader output
glUseProgram(compute_shader);
glUniform1f(1, float(frames) * 0.01f);
glUniform1i(glGetUniformLocation(compute_shader, "texture0"), 0);
glBindImageTexture(glGetUniformLocation(compute_shader, "texture0"), the_texture, 0, GL_FALSE, 0, GL_READ_WRITE, GL_RGBA8);
//glBindImageTexture(0, the_texture, 0, GL_FALSE, 0, GL_WRITE_ONLY, GL_RGBA8);
glDispatchCompute(screen.x / 16, screen.y / 16, 1);
glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT);
glFinish(); //still needed :S
get_opengl_error();
//display the texture on screen
glUseProgram(debug_shader);
mvm.push_matrix(cam);
glUniformMatrix4fv(0, 1, false, &ppl.get_model_view_projection_matrix(cam)[0][0]);
mvm.pop_matrix();
glUniform1i(glGetUniformLocation(debug_shader, "texture0"), 0);
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, the_texture);
glBindVertexArray(quad);
glDrawElements( GL_TRIANGLES, 6, GL_UNSIGNED_INT, 0 );
Does your get_opengl_error() give you any errors? If so, which ones? In any case, the AMD drivers seems to be a bit broken when doing compute shading. For example. glBindImageTexture can throw GL_INVALID_OPERATION, something which isn't even in the GL specification. Also, glDispatchCompute() nukes rendering for me, so if I don't call glClear() after I do glDispatchCompute(), rendering stops working completely. I even tried the same thing on Linux, and whenever I called glDispatchCompute() followed by a glDraw* without a glClear() inbetween, the glDraw* had no effect, even if I conditionally removed the glDispatchCompute() part (so that I only run glDispatchCompute() once). And no, I got no errors anyway either, so GL doesn't report any errors.
In other words, I cannot help you. The above code is more or less exactly the same I have, and it barely works for me, although it doesn't cause a driver crash. However, I'm using an HD5970, so the hardware differs, and that might be the reason.
So maybe some AMD representative should take a look at this to check if it might be a driver issue. It would be really helpful.
Hi,
I have brought this to the attention of our compiler team. We'll investigate and see if we can't fix it.
Thanks,
Graham
Thank you Graham for dealing with the issue!
Hi there,
I've just tried this driver, the issue seems to be fixed now.
AMD Catalyst OpenGL 4.3 Graphics Driver, 7 new OpenGL Extensions - 3D Tech News and Pixel Hacking - ...
A couple of things:
-in the compute shader layout(location=...) is needed to specify an image2D binding point (glBindImageTexture), and not layout(binding=...)
-it seems like the memory barrier is not needed at all after the compute shader call (so the driver is properly figuring it out, maybe?). I suppose if I wanted to run another compute shader that writes the same memory, then I'd have to place barriers.
Thank you for fixing this!
best regards,
Yours3lf
That's strange, the spec explicitly says to use "binding" and not "location" in 4.4.6.2 (which points to 4.4.5).
Also, GL_ARB_shader_image_load_store specification tells:
- Data written to image variables in one rendering pass and read by the
shader in a later pass need not use coherent variables or
memoryBarrier(). Calling MemoryBarrier() with the
SHADER_IMAGE_ACCESS_BARRIER_BIT set in <barriers> between passes is
necessary.
As I understand it, if you continue using UAVs then you must call MemoryBarrier, if you're accessing the data in a different manner then you don't need to sync.
Please correct me if I'm wrong.
Well I used imagestore in the compute pass, but a simple texture() in the displaying pass, so I guess no barriers are required, but I'll try if it crashes with imageload.
As for the binding vs location, the driver seems really beta/alpha version so anything is possible. Hopefully the new khronos conformance tests will eliminate these, but that will be only gl4.4+
I tried it out on NVIDIA hardware (imagestore and a texture sample afterwards) and it worked without the barrier. But be warned, it's different hardware and in my pipeline there are multiple stages, so it takes a little time between the compute stage and the draw stage. By "little" I mean something like 2-3ms.
no errors reported. Otherwise I would have mentioned it.
This is hilarious.
Thank you for the help!