read documentation about samplers. if you don't use images then you must image coordinate clamp manualy.
size of workgroup must be same for whole NDRange. for example you want process image with size 100x100. and you use workgroup size 16x16. then you use global size 112x112. output image will be 112x112 too and after that you run your kernel you just crop it to original size. you must of course clamp coordinate so you do not read outside of source image.