5 Replies Latest reply on Mar 28, 2011 7:15 PM by flavius

    Conditional store

      Any workaround for that?

      Hi, I was wondering how I can correctly write this statement

      foo[bar] = select(foo[bar], other_values, some_condition);

      if the foo[bar] is a vector array. If it was scalar, I could write

      if (some_condition) foo[bar] = other_value;

      but repeating this for vector variables is probably not the best choice. My concern is that I actually do not need to load the contents of foo[bar] to store it again if the condition does not hold, but I want to conditionaly overwrite it.

      Is there something like vstore4_cond(other_values, bar, foo, some_condition) or any hack that would work this way?

      Thanks for your suggestions.

        • Conditional store

          you can use the select function for vector types as given in table 6.14 of opencl spec.

          If it doesn't fit your case using ternary operators instead of ifelse can be another option.

          • Conditional store

            I wouldn't worry too much about loading the contents of foo even if you do not wish to write it. Stores are practically the same speed whether they are vector or scalar loads/writes. Infact HD69xx cards introduce HW accel for scalar mem operations, because AMD is more suited to vector operations.

            Both branches of a select statement are evaluated at all times. This is the trade-off for avoiding if/else statements and the 40 cycles it costs to enter such blocks. Since you do not wish to modify the contents of foo in one case, you have to use if() branching, but since you want to do it vector-wise, you would need 4 if statements. If will not take different paths on different vector boolean values. In my opinion leaving the code as it is, is the fastest solution.


              • Conditional store

                i am not so sure. isn't access into memory have higher latency than 40 cykles of clausule switch?

                • Conditional store

                  Well, blindly following all the case studies saying "don't use ifs!" and "vectorize!" isn't probably the best approach... I have tried to rewrite the code using four if blocks and surprisingly it's really much faster (I am testing Floyd-Warshall's algorithm and the major bottleneck are global memory fetches).

                  Yes, these ifs were at the end of the kernel.

                • Conditional store
                  If the write is at the end of the program, the latency of the write can be hidden by launching more work-groups. Whereas the latency of a clause switches can only be hidden by executing more work-groups in parallel. So it depends on if you can launch more work-groups to cover the clause latency or not to determine which approach would be more beneficial.