I've developed different kernels solving the same problem and tested these kernels on different gpus. The results really suprised me.
The ATI 5450 is the slowest card. Nvidia 9400M is by a 1/3 faster (average).
The 8800 is by far the best although it is fairly old. Ok the 4850 performed also very well,
but as it doesn't support local memory natively results are ok.
But on the 5750 performance is just slightly (max. 5%) better compared to the Nvidia 9400M.
As the 9400M has just 16 CUDA cores i'm surprised that the 5750 (720 streaming processors) is just slightly faster.
Comparing the 5750 with 4850 the 4850 is faster no matter if local memory is used or not.
I know 4850 has 80 streaming processors more but shouldn't the 5750 perform better especially when there are local memory optimizations?
ATI 5450 and 5750 were tested on the same system (Win 7 64-Bit, Drivers from ATI as well as Sapphire web-site 10.2 and 10.3)
ATI 4850 was tested on an other Win 7 64-Bit system as well as Nvida Cards.