cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Raistmer
Adept II

Compare array with threshold - what way will be faster

I need to check if some array has value that exceeds some threshold or not.
Exceeding threshold can be considered as rare event, most values will be under threshold.
What way should be faster?
1) many if operations (remember, exceeding threshold is rare event, not much divergent branching expected here)
2) using fmax function and then compare max value for array with treshold.

I listed examples of both approaches implementation.
Or, maybe, something else will be even better?

1) many if instructions: if(was_pulse.x==0){//R: if pulse already detected no need to check other elements, it will be done on CPU anyway for(int i=0;i<128;i++){if ( (d.x>t.x)||(d.y>t.x)||(d.z>t.x)||(d.w>t.x) ){was_pulse.x=1;break;}} } 2) using fmax: if(was_pulse.x==0){//R: if pulse already detected no need to check other elements, it will be done on CPU anyway pmax=fmax(d[0],d[1]); for(int i=2;i<128;i++){pmax=fmax(pmax,d);} pmax.xy=fmax(pmax.xy,pmax.zw);pmax.x=fmax(pmax.x,pmax.y); if(pmax.x>t.x){was_pulse.x=1;} }

0 Likes
6 Replies

Raistmer,
Unless you need checking for corner cases for floating point values, i'd recommend using your own fmax function. The OpenCL fmax functions must handle denorms, nan's, etc.. and thus are slower than an implementation that does not require these checks.
0 Likes

I did same replacement (if to max() ) for one of Brook kernels and looked what KernelAnalyser says (I can't do the same for OpenCL kernel, KernelAnalyser just crashes when I paste openCL kernel in it).
It shows that max-based kernel executes faster, takes less ALU and CF instructions and (it's important for me too) uses less GPR registers.
Mean/min/max execution times are smaller too.
In object code I see MAX instruction used, that is in Brook max() function almost directly mapped to GPU instruction, right?

Is it possible to do the same in OpenCL? I don't need denormals and NaNs handling at all. Moreover, all my values are non-negative floats!
What can be done in this case to speedup OpenCL's fmax() ?

That is, how to use MAX assembler instruction in OpenCL kernel? As I understood fmax() will not be directly mapped onto it. Some intrinsic available ?
0 Likes

Originally posted by: RaistmerThat is, how to use MAX assembler instruction in OpenCL kernel? As I understood fmax() will not be directly mapped onto it. Some intrinsic available ?


I'm very interested to this topic too. I had already noticed on my own how using fmax() was totally killing the performance of my kernel.

 

Is defining an using my own max() function the fastest solution ?

 

0 Likes
Raistmer
Adept II

But how to define own max() function that will be based on available assembler MAX instruction instead of if statement?
Any ideas?
0 Likes

Raistmer,
I'll bring this up as a feature request for our next release that will allow native versions of all hardware instructions.
0 Likes
Raistmer
Adept II

It would be very nice, thanks a lot!
0 Likes