3 Replies Latest reply on May 7, 2014 1:45 AM by obara

    How many work items A10-7850A's GPU runs the fastest


      Goal to the faster calculation, I try to change

      clEnqueueNDRangeKernel API's parameter "*local_work_size".


      I think the more "local_work_size" the faster processing.

      But actually processing speed is saturated at

        local_work_size = 8 to 16,

      gradually slowdon more than local_work_size =  20.

      I think it strange because A10-7850A has 512 stream processors.

      What wrong?