I've noticed that once kernel becomes big enough performance significantly dropped. I have feelings that reason of this that GPU constantly reloading instructions of kernel to "code cache" (or how to call it on GPU?) replacing old ones with new ones -- thus, performance problem.
Anyone knows what exact size of this cache? My guess is 64Kb for RV770 but may be this information already somewhere around?