Imagine I want to apply an image filter to a 640x480 image using 128x128 localwork groups. As the height(480) is not divisible by 128, a problem gonna occur!
Do I need to make something special to deal with this or is the OpenCL implemention clever enough to avoid to process the padding/pitch pixels automatically?
And... is the 640x480 / 128x128 optimal? I'll get 5x4=20 blocks in total, and my 5750 has 9 compute units so I think there will be enough data to be optimal...( unless each compute unit can process several work groups in parallel! )