cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

flavius
Journeyman III

Conditional store

Any workaround for that?

Hi, I was wondering how I can correctly write this statement

foo[bar] = select(foo[bar], other_values, some_condition);

if the foo[bar] is a vector array. If it was scalar, I could write

if (some_condition) foo[bar] = other_value;

but repeating this for vector variables is probably not the best choice. My concern is that I actually do not need to load the contents of foo[bar] to store it again if the condition does not hold, but I want to conditionaly overwrite it.

Is there something like vstore4_cond(other_values, bar, foo, some_condition) or any hack that would work this way?

Thanks for your suggestions.

0 Likes
5 Replies
himanshu_gautam
Grandmaster

you can use the select function for vector types as given in table 6.14 of opencl spec.

If it doesn't fit your case using ternary operators instead of ifelse can be another option.

0 Likes
Meteorhead
Challenger

I wouldn't worry too much about loading the contents of foo even if you do not wish to write it. Stores are practically the same speed whether they are vector or scalar loads/writes. Infact HD69xx cards introduce HW accel for scalar mem operations, because AMD is more suited to vector operations.

Both branches of a select statement are evaluated at all times. This is the trade-off for avoiding if/else statements and the 40 cycles it costs to enter such blocks. Since you do not wish to modify the contents of foo in one case, you have to use if() branching, but since you want to do it vector-wise, you would need 4 if statements. If will not take different paths on different vector boolean values. In my opinion leaving the code as it is, is the fastest solution.

 

0 Likes

i am not so sure. isn't access into memory have higher latency than 40 cykles of clausule switch?

0 Likes

Well, blindly following all the case studies saying "don't use ifs!" and "vectorize!" isn't probably the best approach... I have tried to rewrite the code using four if blocks and surprisingly it's really much faster (I am testing Floyd-Warshall's algorithm and the major bottleneck are global memory fetches).

Yes, these ifs were at the end of the kernel.

0 Likes

If the write is at the end of the program, the latency of the write can be hidden by launching more work-groups. Whereas the latency of a clause switches can only be hidden by executing more work-groups in parallel. So it depends on if you can launch more work-groups to cover the clause latency or not to determine which approach would be more beneficial.
0 Likes