AnsweredAssumed Answered

Global work-offset implies performance hit?

Question asked by znakeeye on Sep 5, 2012
Latest reply on Sep 6, 2012 by dmeiser

I have a kernel that processes a large image (OpenCL 1.1, data type is image2d_t). Sometimes I only want to process a region of this image. The obvious solution is to use a global work-offset. I would expect this to yield a performance gain, but so far I only get worse execution time with non-zero offsets!

 

Example

Image is 4096x4096 pixels. Local work size is 8x8.

A: Entire image processed, no offset:

globalWorkSize = { 4096, 4096 };
globalWorkOffset = { 0, 0 };
Execution time is 38 seconds


B: Sub-image processed using offsets:

globalWorkSize = { 3296, 3296 };
globalWorkOffset = { 400, 400 };
Execution time is 58 seconds

 

C: Cropped image at 3296x3296 pixels, no offset:
globalWorkSize = { 3296, 3296 };
globalWorkOffset = { 0, 0 };
Execution time is 28 seconds

 

 

Can somebody please explain why I get these results? Makes no sense!

Outcomes