cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

noxnet
Journeyman III

juniper vs cedar vs nvidia 8800gts vs nvidia 9400m

bad performance on ATI

I've developed different kernels solving the same problem and tested these kernels on different gpus. The results really suprised me.

Test GPUs:

ATI HD4850
ATI HD5450
ATI HD5750
Nvidia 8800GTS
Nvidia 9400M

The ATI 5450 is the slowest card. Nvidia 9400M is by a 1/3 faster (average).

The 8800 is by far the best although it is fairly old. Ok the 4850 performed also very well,
but as it doesn't support local memory natively results are ok.

But on the 5750 performance is just slightly (max. 5%) better compared to the Nvidia 9400M.
As the 9400M has just 16 CUDA cores i'm surprised that the 5750 (720 streaming processors) is just slightly faster.

Comparing the 5750 with 4850 the 4850 is faster no matter if local memory is used or not.
I know 4850 has 80 streaming processors more but shouldn't the 5750 perform better especially when there are local memory optimizations?

ATI 5450 and 5750 were tested on the same system (Win 7 64-Bit, Drivers from ATI as well as Sapphire web-site 10.2 and 10.3)

ATI 4850 was tested on an other Win 7 64-Bit system as well as Nvida Cards.

0 Likes
14 Replies
noxnet
Journeyman III

I run cpu-z on my system.

Name
Radeon HD 5750

Codename
RV840

Technology
40 nm

Memory size
1024 MB

Memory bus width
128 bits

GPU ref clock
27000

PCI device
bus 1 (0x1), device 0 (0x0), function 0 (0x0)

Vendor ID
0x1002 (0x174B)

Model ID
0x68BE (0xE138)

Performance Level
0

Core clock
157.0 MHz

Memory clock
300.0 MHz

I'm wondering about core clock and memory clock, which are by far to low. When querying clock frequency using OpenCL it returns 700 which is the value it should be.

0 Likes

GPU-z show current clock which is lower to save power consumption.

0 Likes
ryta1203
Journeyman III

This might be a case of you writing your kernel for Nvidia cards and then running them on ATI cards. All GPUs are not designed equally.

0 Likes

Thats the reason why i've tested different kernels and not only one. But the result is always the same. As mentioned the ATI 4850 performs quite good (far better than 9400M), especially when no local memory is used.

Can this be a driver issue on Win 7 64-Bit concerning 5xxx gpus?

I have already tried Catalyst Driver 9.12, 10.2 and 10.3

Are there OpenCL benchmark programms available for Win 7 64-bit?

0 Likes

You can try out as many kernels as you want, if they are all programmed for Nvidia GPUs it isn't going to matter.

0 Likes

as example i got 1 kernel with local memory optimizations. The average execution time (1000 iterations) on the different gpus

HD5750: 3.12 ms

HD4850: 2.60 ms

NV9400M: 3.21 ms

NV8800GTS: 0.71 ms

This kernel might be optimized based on Nvidia papers, but these optimizations concern bank conflict and coalescing accesses to global memory. This stuff is quit the same at ATI and Nvida cards. At least should the 5750 be faster than the 4850.

 

0 Likes

another example is a kernel optimized for 4xxx cards. as before average (1000 iterations)

HD4850: 1.53 ms  

HD5750: 11.30 ms

@Micah Has this issue concerned 4xxx or 5xxx cards?

0 Likes

Originally posted by: noxnet another example is a kernel optimized for 4xxx cards. as before average (1000 iterations)

 

HD4850: 1.53 ms  

 

HD5750: 11.30 ms

 

@Micah Has this issue concerned 4xxx or 5xxx cards?

 

there is something wrong. as i recall this issue was running 32 bit application on 64 bit system. here is that thread http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=124758

try run it on 32 bit system or compile as 64 bit.

0 Likes

Originally posted by: noxnet another example is a kernel optimized for 4xxx cards. as before average (1000 iterations)

HD4850: 1.53 ms  

HD5750: 11.30 ms

@Micah Has this issue concerned 4xxx or 5xxx cards?

Wow! I never saw this, care to post the code?

 

0 Likes

sorry i can't post kernels.

I ran the samples from the thread referenced by nou and posted my results there.

http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=124758

0 Likes

Originally posted by: noxnet as example i got 1 kernel with local memory optimizations. The average execution time (1000 iterations) on the different gpus

HD5750: 3.12 ms

HD4850: 2.60 ms

NV9400M: 3.21 ms

NV8800GTS: 0.71 ms

This kernel might be optimized based on Nvidia papers, but these optimizations concern bank conflict and coalescing accesses to global memory. This stuff is quit the same at ATI and Nvida cards. At least should the 5750 be faster than the 4850.

 

 

Maybe if you post the kernels. Like I've said before, not all GPUs are designed the same, even if they have similar features (it's very hard to convince even "experts" of this for some reason).

0 Likes

what is speed comaprsion 4850 and 5750? because it shoul be rougly the same or better on 5750 side.

0 Likes

48XX and 57XX should be equivalent in performance on kernels written for the 4XXX architecture(no local memory, atomics), but the inverse is not valid.

On a side note, there was reported back in January an issue with performance on 64bit systems from BarnacleJunior. You might be hitting this issue as I do not believe that it was fixed in time for 2.01.
0 Likes

noxnet,
local memory optimizations on 4XXX cards and 5XXX cards are vastly different, they cannot be programmed the same way and having the same performance expectations even at the IL level. At the CL level on 4XXX, local memory is emulated in global memory, so using local is equivalent to using global.
0 Likes