The performance of various algorithms on ATI graphic cards can vary drastically depending on the quality of the implementation. A generic problem that many people have in seeing good performance on ATI cards stems from many of the implementation being ports of CUDA optimized code and not code optimized for ATI GPU's. In Section 4 of the OpenCL Programmers guide, we give advice on how to extract high performance from ATI GPU's, including tips for CUDA programmers. The raw peak GFLOPS performance of a 5760 is ~4x that of a 9600GT, so the algorithm in question obviously was not taking full advantage of ATI's architecture.
Your question is the same one I had six months ago: ATI or NVIDIA? I had no experience with GPGPU or legacy investment to constrain my choice. After a few months working with OpenCL on a HD 5870 and now after just purchasing a GTX 480, I have some opinions.
If you have a legacy investment in CUDA (or need to use legacy libraries), then NVIDIA makes sense a lot of sense. Also, most of the published research I've seen is done on NVIDIA hardware. This can be helpful as designing optimized kernels appears to be very different for ATI compared with NVIDIA. What works well on one does not on the other. For example, ATI likes vectorized kernels (float4). NVIDIA likes scalar kernels (float).
I haven't done any real work with the GTX 480 yet. Here are only my first impressions. I am underwhelmed. First, the mechanical packaging is noticeably better on the HD 5870. The GTX 480 seems rushed, almost like it is still a prototype. Also, if you need double precision, you need to use a professional Tesla C2050 or C2070 instead of the consumer GTX models. I believe that nVidia locked the double precision to 1/4 of full performance. So you won't get the full power unless using Tesla GPUs (which are much more expensive - then on the other hand, only nVidia has ECC memory...).
In single precision, the HD 5870 has roughly 2x the peak of the GTX 480. My experience is achieving around 35% to 40% utilization on the 5870 doing matrix multiplication. Even if higher utilization is possible on the 480, the lower absolute performance will hurt. The 5870 wins as it just has much more brute power in single precision. And it wins in double precision as the consumer GTX GPUs are locked to lower performance than the architecture is capable of.
So this is how ATI is good.
There are weak points too. Full use of device memory may not be fully sorted out with drivers. That's an issue with sustained performance that takes into account PCIe bus data transfers, not only kernel performance. This is the big advantage of nVidia in my opinion. I believe that you can use full device memory (have not done it yet, just what I have read).
So I think it's a mixed bag. It really depends on what you want to do. I am very happy that I started with ATI. I think I had a lot more fun and learned much more going this route. On a technical level, if you have problems with high arithmetic intensity, then the ATI architecture will tend to win. ATI seems to have made design choices favoring lots of ALUs and high peak performance. If your problems are more memory constrained, then nVidia may tend to win (I only write "may" as I do not have personal experience yet, I am repeating heresay.)
Originally posted by: MicahVillmow dnorric, The performance of various algorithms on ATI graphic cards can vary drastically depending on the quality of the implementation. A generic problem that many people have in seeing good performance on ATI cards stems from many of the implementation being ports of CUDA optimized code and not code optimized for ATI GPU's. In Section 4 of the OpenCL Programmers guide, we give advice on how to extract high performance from ATI GPU's, including tips for CUDA programmers. The raw peak GFLOPS performance of a 5760 is ~4x that of a 9600GT, so the algorithm in question obviously was not taking full advantage of ATI's architecture.
Sadly though, even the HPC community seems to have a problem with wanting to write architecture specific optimized code (which really quite befuddles me) and mostly sticks to the less powerful CUDA cards.
Wow so i guess opencl is not heterogenous in any way when it comes to performance. Do you think this will change in the future?
I think i might end up going for the nvidia 480. For one i absolutly need 3d vision. Yes ati can do this via iz3d however that means im locked into that brand. I would also like to make use of multiple monitors in 3d which nvidia seems to be the only one on the market that can do.
Its a bit annoying as i have been an amd, ati fan for yonks. 3D really is a must though for my project.
Is there any way i can achieve what i have mentioned using ati and amd paired.
"Wow so i guess opencl is not heterogenous in any way when it comes to performance. Do you think this will change in the future?"
I doubt it. So far, I believe that OpenCL has three kinds of devices: CPUs, GPUs (ATI, NVIDIA) and ACCELERATORs (STI Cell, Larrabee?). These are just very different. I very much doubt that compiler technology will advance to the point that it can hide the underlying architecture completely. To make an analogy, this is like asking a compiler to take a single-threaded program and generate code that efficiently uses multiple cores. To my knowledge, such technology does not exist.
If you do go the NVIDIA route, I wouldn't give up on OpenCL. I think that OpenCL is sometimes identified with ATI. But Apple started it and then it was a group effort with ATI, IBM, Intel and NVIDIA before becoming a standard. It is a step in the right direction.
I guess your right cjang, it would be a mammoth task. I have been looking around think might possibly the ati with iz3d and zalman monitors might allow for 3 monitors using stereo. The problem is that from what i understand for each zalman 3d monitor you require 2 ports which rocks up the price of the system. Not sure what card(s) you would need to get away with that