cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

licoah
Journeyman III

execution domain

I have a question about execution domain.

According to the CLInfo, on my GPU there are only 256 work items in one dimension. But when use brook+ we can solve the problem with 2^23 elements in one demension. For example if I want to transpose a Matrix in one kernel, then this matrix can not have more than 256 elements in one dimention? If so, the execution domain with opencl is too small.

 

0 Likes
3 Replies
nou
Exemplar

no 256 is size of local workgroup. global work size has no such limit.

0 Likes
Fr4nz
Journeyman III

Originally posted by: licoah I have a question about execution domain.

 

According to the CLInfo, on my GPU there are only 256 work items in one dimension. But when use brook+ we can solve the problem with 2^23 elements in one demension. For example if I want to transpose a Matrix in one kernel, then this matrix can not have more than 256 elements in one dimention? If so, the execution domain with opencl is too small.



You are confusing total number of work-items with the maximum number of work-items in a work-group. The first is unlimited, the second is limited by the architecture of the video card (256 work-items per work-group on ATI videocards and 512 on Nvidia videocards).

0 Likes

well global size is limited. to 2^32-1 or 2^64-1 respectively. you get which value it is queryng device with CL_DEVICE_ADDRESS_BITS. it return 32 or 64.

for my system it return 64 for CPU and 32 for GPU. i think on 32 bit system it will be 32 for CPU too.

0 Likes