I've developed different kernels solving the same problem and tested these kernels on different gpus. The results really suprised me.
Test GPUs:
ATI HD4850
ATI HD5450
ATI HD5750
Nvidia 8800GTS
Nvidia 9400M
The ATI 5450 is the slowest card. Nvidia 9400M is by a 1/3 faster (average).
The 8800 is by far the best although it is fairly old. Ok the 4850 performed also very well,
but as it doesn't support local memory natively results are ok.
But on the 5750 performance is just slightly (max. 5%) better compared to the Nvidia 9400M.
As the 9400M has just 16 CUDA cores i'm surprised that the 5750 (720 streaming processors) is just slightly faster.
Comparing the 5750 with 4850 the 4850 is faster no matter if local memory is used or not.
I know 4850 has 80 streaming processors more but shouldn't the 5750 perform better especially when there are local memory optimizations?
ATI 5450 and 5750 were tested on the same system (Win 7 64-Bit, Drivers from ATI as well as Sapphire web-site 10.2 and 10.3)
ATI 4850 was tested on an other Win 7 64-Bit system as well as Nvida Cards.
I run cpu-z on my system.
Name
Radeon HD 5750
Codename
RV840
Technology
40 nm
Memory size
1024 MB
Memory bus width
128 bits
GPU ref clock
27000
PCI device
bus 1 (0x1), device 0 (0x0), function 0 (0x0)
Vendor ID
0x1002 (0x174B)
Model ID
0x68BE (0xE138)
Performance Level
0
Core clock
157.0 MHz
Memory clock
300.0 MHz
I'm wondering about core clock and memory clock, which are by far to low. When querying clock frequency using OpenCL it returns 700 which is the value it should be.
GPU-z show current clock which is lower to save power consumption.
This might be a case of you writing your kernel for Nvidia cards and then running them on ATI cards. All GPUs are not designed equally.
Thats the reason why i've tested different kernels and not only one. But the result is always the same. As mentioned the ATI 4850 performs quite good (far better than 9400M), especially when no local memory is used.
Can this be a driver issue on Win 7 64-Bit concerning 5xxx gpus?
I have already tried Catalyst Driver 9.12, 10.2 and 10.3
Are there OpenCL benchmark programms available for Win 7 64-bit?
You can try out as many kernels as you want, if they are all programmed for Nvidia GPUs it isn't going to matter.
as example i got 1 kernel with local memory optimizations. The average execution time (1000 iterations) on the different gpus
HD5750: 3.12 ms
HD4850: 2.60 ms
NV9400M: 3.21 ms
NV8800GTS: 0.71 ms
This kernel might be optimized based on Nvidia papers, but these optimizations concern bank conflict and coalescing accesses to global memory. This stuff is quit the same at ATI and Nvida cards. At least should the 5750 be faster than the 4850.
another example is a kernel optimized for 4xxx cards. as before average (1000 iterations)
HD4850: 1.53 ms
HD5750: 11.30 ms
@Micah Has this issue concerned 4xxx or 5xxx cards?
Originally posted by: noxnet another example is a kernel optimized for 4xxx cards. as before average (1000 iterations)
HD4850: 1.53 ms
HD5750: 11.30 ms
@Micah Has this issue concerned 4xxx or 5xxx cards?
there is something wrong. as i recall this issue was running 32 bit application on 64 bit system. here is that thread http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=124758
try run it on 32 bit system or compile as 64 bit.
Originally posted by: noxnet another example is a kernel optimized for 4xxx cards. as before average (1000 iterations)
HD4850: 1.53 ms
HD5750: 11.30 ms
@Micah Has this issue concerned 4xxx or 5xxx cards?
Wow! I never saw this, care to post the code?
sorry i can't post kernels.
I ran the samples from the thread referenced by nou and posted my results there.
http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=124758
Originally posted by: noxnet as example i got 1 kernel with local memory optimizations. The average execution time (1000 iterations) on the different gpus
HD5750: 3.12 ms
HD4850: 2.60 ms
NV9400M: 3.21 ms
NV8800GTS: 0.71 ms
This kernel might be optimized based on Nvidia papers, but these optimizations concern bank conflict and coalescing accesses to global memory. This stuff is quit the same at ATI and Nvida cards. At least should the 5750 be faster than the 4850.
Maybe if you post the kernels. Like I've said before, not all GPUs are designed the same, even if they have similar features (it's very hard to convince even "experts" of this for some reason).
what is speed comaprsion 4850 and 5750? because it shoul be rougly the same or better on 5750 side.