Originally posted by: d.a.a. Why don't you just increase the granularity of each work-item, so each one would process three elements (768/256) instead of only one?
|
Thanks for the reply. I'm interested in this question in general, regardless if it's possible for a work item to process multiple elements. Also, the code would become much more complex. We have a bunch of different kernels that require greater than 256 work items, and multiple conditionals would be required for some of the kernels to ensure correct execution.