I have a question about execution domain.
According to the CLInfo, on my GPU there are only 256 work items in one dimension. But when use brook+ we can solve the problem with 2^23 elements in one demension. For example if I want to transpose a Matrix in one kernel, then this matrix can not have more than 256 elements in one dimention? If so, the execution domain with opencl is too small.