Hi,
Few tips to help you out:
1. Make sure every OpenCL API you use on the host code spits out the error code in case it fails. Then check the meaning of this error code in cl.h file.
2. It looks your Bubble sort code is written wrongly. Bubble sort as i know is done in steps. In the first step, the largest elements is moved to the last place in the array. In the 2nd step 2nd largest element moves in second last sopt on array. Now these two steps cannot be done in parallel, but you are assigning this work to two work-items for the GPU which will run in parallel. IMHO bubble sort is not easily parallelizeable. Check out some other parallel sorting algorithms like radixsort.
3. Doing b[gid] = a[gid]; inside that kernel anyhow does not make sense. I guess you were just playing around in the kernel though. darkhmz pointed to specific issues in the kernels, so no point repeating them.
4. The png image attached seems to be reporting kernel compilation error, so now you need to find what API is throwing an error for you and what. Check OpenCL 1.2 spec for help on those errors. And you may want to check APP SDK Sample related to sorting.