15 Replies Latest reply on Jan 12, 2011 4:06 PM by Barsik107

    IF in OpenCL kernel

    Barsik107

      I have vector of 40000 elements, I wan't to calculate it somehow, but I need next condition(IF):

       

      uint tid = get_global_id(0);

      uint tid1 = get_global_id(1);

       

      const uint range=200;

      index = tid*range+tid1;

       

      if(index>200)

      {

      output_Tay[index]=index;

      }

      But this kernel calculate not 39800, but only 7979 elements and I can't understand why? Is there something special about using IF in kernels?

       

       

       

        • IF in OpenCL kernel
          nou

          what is your global range?

            • IF in OpenCL kernel
              Barsik107

              Global range is {200;200} I dont think thats the problem, cause if I use kernel without condition - it proceed throu all of 40000 elements. That seems very strange to me.

               

                • IF in OpenCL kernel
                  rick.weber

                  If you remove the if, does the output equal 0, 1, 2, ... 40000? If not, your indexing is messed up. There is nothing magical about if statements with respect to correctness. Branch divergence only affects performance. Can you try using a 1D global range of size 40,000?

                    • IF in OpenCL kernel
                      Barsik107

                       

                      Originally posted by: rick.weber If you remove the if, does the output equal 0, 1, 2, ... 40000? If not, your indexing is messed up. 


                      Yes it is, if I remove IF then everything is ok. It proceed throu all 40000 values. I know it by simple operation

                       output_Tay[index]=index;

                       

                      index = tid*get_global_size([0|1])+tid1; 


                      Still doesnt help, I use it like:

                      index = tid*get_global_size(1)+tid1; 

                      And so it was just 7979 elements calculated.

                        • IF in OpenCL kernel
                          himanshu.gautam

                          I think you are messing up with indexing.

                          AFAIK it should be tid1*range + tid

                          or get_global_size(1) * get_global_size(0) + get_global_id(0)

                            • IF in OpenCL kernel
                              jeff_golds

                               

                              Originally posted by: himanshu.gautam I think you are messing up with indexing.

                               

                              AFAIK it should be tid1*range + tid

                               

                              or get_global_size(1) * get_global_size(0) + get_global_id(0)

                               

                              You're just transposing from column-major to row-major (or vice versa).  If the global dimensions are 200x200, then you should get the same results up to transposition.

                              Jeff

                              • IF in OpenCL kernel
                                Barsik107

                                I KNOW thaht I'm messing with indexes, but I can't find where. I just can't explain that fact that if I use kernel like:

                                 

                                uint tid = get_global_id(0);

                                uint tid1 = get_global_id(1);

                                uint index;

                                index = tid*get_global_size(0)+tid1; 

                                output_Tay[index]=index;

                                 

                                barrier(CLK_LOCAL_MEM_FENCE);

                                 

                                Thet it work's and proceed throu all 40000 elements, but if I'm just adding simple IF:

                                 

                                 

                                uint tid = get_global_id(0);

                                uint tid1 = get_global_id(1);

                                uint index;

                                index = tid*get_global_size(0)+tid1; 

                                if (index>200)

                                {

                                output_Tay[index]=index;

                                 

                                barrier(CLK_LOCAL_MEM_FENCE);

                                }

                                Then it doesn't work. In that case only one of five elements is recordered in output_Tay, and actually recording begins from number 200, but why only one of 5 elements? The numbers of elements: 205,210,215,200 and so on. Something really really strange is going on.



                                 

                                  • IF in OpenCL kernel
                                    jeff_golds

                                    output_Tay is a global buffer?  If so, then the barrier isn't needed (and is the wrong barrier since you are writing to global memory, not local) because no threads write to the same address.

                                    Are you running this on the CPU or GPU?  What SDK are you using?  What driver do you have installed?

                                    Jeff

                                      • IF in OpenCL kernel
                                        Barsik107

                                        Yeah output_Tay is global buffer. I'm running it on CPU(simulation). ATI Stream SDK

                                          • IF in OpenCL kernel
                                            nou

                                            that barrier is bad. barries  must hit all workitems in workgroup  or it will be lead to undefined behaviour.

                                              • IF in OpenCL kernel
                                                himanshu.gautam

                                                yeah nou seems to be right.

                                                Barrier should not be present inside an If block. If all workitems do not take the same route at the if(which is the case here), you can expect undefined behaviour.

                                                  • IF in OpenCL kernel
                                                    Barsik107

                                                    All right! Tra barrier is removed and I've got good result! And I have next questions about barriers. So I have simple kernel like that:

                                                     

                                                    __kernel void KernelTay(__global  float * input_Tay,

                                                             __global  float * output_Tay)

                                                    {

                                                    uint tid = get_global_id(0);

                                                    uint tid1 = get_global_id(1);

                                                    uint index;

                                                    index = tid*get_global_size(0)+tid1; 

                                                     

                                                    //saving to local memory

                                                    float Tay= input_Tay[index];

                                                    barrier(CLK_LOCAL_MEM_FENCE);

                                                     

                                                    if(index>get_global_size(0))

                                                    {

                                                    //some operations with Tay

                                                     

                                                    }

                                                    barrier(CLK_LOCAL_MEM_FENCE);

                                                    output_Tay[index]=Tay;

                                                     

                                                    }

                                                    Am I using bariers correctly, or shold I just go and read specification one more time?

                                                     

                                                     

                                                      • IF in OpenCL kernel
                                                        nou

                                                        yes every workitem in workgroup must hit the SAME barrire.

                                                        even this is bad if workitem diverge and take different path.

                                                        if(some_condition)
                                                        {
                                                        barrier(CLK_LOCAL_MEM_FENCE);
                                                        }
                                                        else
                                                        {
                                                        barrier(CLK_LOCAL_MEM_FENCE);
                                                        }

                                                        and your example is wrong. first you use local barrier when you work with global memory. (it will work but it is meaningless)

                                                        and barrier is needed if you modify global/local memory and you will use this modified value in other workitem. otherwise barrier is unnecessary. like

                                                        __kernel kkk(__global int *buf) { size_t lid = get_local_id(0); size_t gid = get_global_id(0); __local mem[64]; mem[lid] = buf[gid]; barrier(CLK_LOCAL_MEM_FENCE);//barrier to ensure that every woritem get proper modified value int s=0; for(int i=0;i<64;i++)s+=mem[i]; buf[gid] = s; //barrier(CLK_GLOBAL_MEM_FENCE); another barrier if you want use modified buf[gid] value in another workitem. //working with buf[gid+-local_range] }

                                  • IF in OpenCL kernel
                                    MicahVillmow
                                    Looks likely your indexing might be messed up. Please use this calculation instead.
                                    index = tid*get_global_size([0|1])+tid1;

                                    This will give you the value for the x dimension of your launch size instead of a hard coded number.