cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

spectral
Adept II

Recursive computation - minimum size of the set ?

Hi,

I'm currently working on a application that use a "recursive" schema, something like

N(i) = f( N(i-1) )

So, this computation cannot be put in parallel, but in my context I can use several 'N', ie. I can start P threads each computing a sequence of N.

The goal is to minimize the number of N and I would like to know how many computation I can put in parallel ?

ie. what is the number of threads (work-item) I can launch together before reusing the same set of N ?

0 Likes
3 Replies
nou
Exemplar

simple rule is as many workitems as you can. for older card absolutly minimum of wotkitems is number of compute units*64. for new GCN architecture as 7xxx it is number of compute units*64*4

Thanks nou,

But the goal is to have the minimum number (This greatly boost the performance of my application) ! If possible, something that can be computed for both CPU and GPU !

I don't want to hard-code this number !!! But, I will already be happy if I can have something like ;

int GetMinWorkSize()

{

  int unitsCount = clGetInfo(Units); // Does this information is available

  int rule = 64 * 4;

  return rule * unitsCount;

}

Maybe I have also to play with the kind of device (CPU or GPU) and maybe other informations can help ?

Thx

0 Likes

look at clGetDeviceInfo() and CL_DEVICE_MAX_COMPUTE_UNITS and CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE from clGetKernelWorkgroupInfo(). but it still lack that 4 multiple as GCN just lauch 4 workgroups on one CU at the same time (or 4 wavefronts from same workgroup if is workgroup size 256).

0 Likes