Kernel takes more time on Discrete GPU than the Integrated GPU
I am getting some strange results from running my code. To do the sanity check, I have also tried couple of benchmarks form the SDK (e.g matrix multiplication, blackscholes) but in each case the run time for "kernel" on the discrete GPU is more than the integrated GPU. The time on integrated GPU also doesn't change much with changing the problem size? Has anyone else experienced the same issue with the integrated GPU? Any thoughts or pointers will be appreciated.
My system is APU A3850 with integrated GPU (HD6550, Beaver Creek) and discrete GPU HD7970 (Tahiti). Running on Fedora 14, catalyst driver 12.x
I tried upgrading the Catalyst driver (13.x) but this leads to seg fault on starting CodeXL.