I don't know if that it's supposed to be supported by OpenCL language but
some samples that work from NVIDIA OpenCL SDK doesn't work with your SDK mainly is because they use something like:
__kernel void filter( __global const uchar4* src, __global unsigned int* dest,
__local uchar4 local)
The problem lies in __local uchar4 local your SDK fails to compile..
A solution I have found to fix their examples for your SDK is to change two things:
1. __local uchar4 local by __local *local in kernel definiton
2. References to the array
change local[x][y] with local[y+(32)*x]
Are you going to fix this or are NVIDIA guys using non standard features?