Hi, I was wondering how I can correctly write this statement
foo[bar] = select(foo[bar], other_values, some_condition);
if the foo[bar] is a vector array. If it was scalar, I could write
if (some_condition) foo[bar] = other_value;
but repeating this for vector variables is probably not the best choice. My concern is that I actually do not need to load the contents of foo[bar] to store it again if the condition does not hold, but I want to conditionaly overwrite it.
Is there something like vstore4_cond(other_values, bar, foo, some_condition) or any hack that would work this way?
Thanks for your suggestions.
you can use the select function for vector types as given in table 6.14 of opencl spec.
If it doesn't fit your case using ternary operators instead of ifelse can be another option.
I wouldn't worry too much about loading the contents of foo even if you do not wish to write it. Stores are practically the same speed whether they are vector or scalar loads/writes. Infact HD69xx cards introduce HW accel for scalar mem operations, because AMD is more suited to vector operations.
Both branches of a select statement are evaluated at all times. This is the trade-off for avoiding if/else statements and the 40 cycles it costs to enter such blocks. Since you do not wish to modify the contents of foo in one case, you have to use if() branching, but since you want to do it vector-wise, you would need 4 if statements. If will not take different paths on different vector boolean values. In my opinion leaving the code as it is, is the fastest solution.
i am not so sure. isn't access into memory have higher latency than 40 cykles of clausule switch?
Well, blindly following all the case studies saying "don't use ifs!" and "vectorize!" isn't probably the best approach... I have tried to rewrite the code using four if blocks and surprisingly it's really much faster (I am testing Floyd-Warshall's algorithm and the major bottleneck are global memory fetches).
Yes, these ifs were at the end of the kernel.