1) many if instructions: if(was_pulse.x==0){//R: if pulse already detected no need to check other elements, it will be done on CPU anyway for(int i=0;i<128;i++){if ( (d.x>t.x)||(d.y>t.x)||(d.z>t.x)||(d.w>t.x) ){was_pulse.x=1;break;}} } 2) using fmax: if(was_pulse.x==0){//R: if pulse already detected no need to check other elements, it will be done on CPU anyway pmax=fmax(d[0],d[1]); for(int i=2;i<128;i++){pmax=fmax(pmax,d);} pmax.xy=fmax(pmax.xy,pmax.zw);pmax.x=fmax(pmax.x,pmax.y); if(pmax.x>t.x){was_pulse.x=1;} }
Originally posted by: RaistmerThat is, how to use MAX assembler instruction in OpenCL kernel? As I understood fmax() will not be directly mapped onto it. Some intrinsic available ?
I'm very interested to this topic too. I had already noticed on my own how using fmax() was totally killing the performance of my kernel.
Is defining an using my own max() function the fastest solution ?