Hi there!
I have written a GLSL fragment shader and pretty quickly found out that if (performance_needed) branching_is_something_to_avoid
So I wanted to be smart. My code is a flow simulation, reading inputs from sampler2D's. Among other data that are saved in those textures, there is a float value named "obst" that is 0.0 for "no obstacle" and 1.0 for "obstacle":
read = texture2D(Texture3, gl_TexCoord[0].st);
float obst;
obst = read.x;
Now, further down in main() there is an if-clause:
fOut = fIn - omega * (fIn-fEq); // works perfectly,
if (obst==1.0) fOut = fIn[opp]; // but is slow!
So, in order to avoid that if-clause, I made it a sum, abusing the "obst" value to switch of the term to be neglegted:
fOut = (1.0-obst)*(fIn - omega * (fIn-fEq)) + fIn[opp]*obst;
And here comes the catch: the above line does _not_ work. I had a hard time finding out that I have to switch the position of "obst":
... + fIn[opp]*obst; // original code, wont work, is equal to ... + 0.0
... + obst*fIn[opp]; // works very well
Is this by design or is it a feature?
I am on Ubuntu 10.04, Intel Core 2 Duo, ATI Mobility HD 2600
Could you please tell me what fIn and fOut are? A vec? An array? It's strange that the following are both valid.
fOut = fIn;
fOut = fIn[Opp];
Pheww... somehow those brackets were taken out? That may explain why its going italic from that position on...
Anyway, its all arrays of floats:
fOut
if (obst==1.0) fOut
Uh... let me see if this works. I replaced the index from "i" to "k". Obviously the "i" in square brackets is the html command for italic.
Aight, k seems to be a good index. Here is the rest of first message's code:
fOut
fOut
To be precise: fEq is a local float (within for-loop), while fIn, fOut, opp and omega are declared globally:
float[9] fIn, fOut;
int[9] opp;
const float omega = ...
Hope that makes more sense now.
When compiled by a simple shader, I can't see any difference between the two ways. It's better to paste out the whole shader or send me it by frank.li@amd.com.
You should also retry it with the latest driver - Cat10.12. We had a bug for embedded indice just like "fIn[opp
Ok. Not today tho. Gonna try to cook this down a little - the shader is rather lengthy and doesnt setup the textures (texture that contains the "obst" values).
Thx for looking at it for now.
Ok I upgraded to 10.12 and bug is gone... shame on me, should 've done that earlier
I got another rather boring question about how to get my shader faster. I know that you get loads of those, so I would be fully satisfied with some plausible advice, nothing in-depth analysis.
My shader takes 5 textures as inputs (texture units 0-4) and renders into 5 textures via FBO (color attachments 0-4). I do flip those outputs to the inputs, which some call "ping pong shader" or so.
I noticed a significant speedup disabling GL_BLEND before running the shader. Makes sense to me, cause it may avoid some costy blend function for writing into the textures.
Are there any other switches that _may_ speed up reading from/writing to textures? GL_BLEND gave me 30% speedup, and looking for more I was disabling and enabling switches quite randomly Without success, tho.
Thx in advance,
fetti
I think it will be helpful that you could provide your sample code to us for investigation.