Archives Discussions

Hill_Groove · ‎12-11-2009

speed

Hello

Is there any CAL perfomance advantages over OpenCL?

hazeman · ‎12-12-2009

Quick answer is yes ( OpenCL is written on top of CAL, so it can't be faster ). Full answer is a little bit longer.

On the 4xxx family with CAL you can get almost full power of the card. But you should be warned - it will be rather painfull. Documentation is really bad or missing ( with regard to optimization ) and compiler is sometimes doing strange things ( so you need it to trick it to get quality code ). On the other hand OpenCL for 4xxx is reaalllyyy bad ( lacking cached memory access and LDS ) - it's about 3x slower than Brook+.

With 5xxx family it's hard to say. There are some results suggesting ( search streamsdk forum ) that there is problem with memory transfer speed ( we will se if new CAL version will corect it ). So with exception of memory transfer you can get almost full power of 5xxx with CAL.

OpenCL on 5xxx is again a problem. In theory OpenCL on 5xxx should work like a charm ( it doesn't miss LDS, new memory access instructions ) but results are not supporting it ( maybe again problems with memory - who knows ). At the moment performance for some applications is comparable to OpenCL on 8800GT.

Hill_Groove · ‎12-12-2009

hazeman,

thank you for such a detailed review. And what can you conclude? Is CUDA better today? I mean CAL needs IL-kernels, and they are hard to write and OpenCL is not ready yet =/

hazeman · ‎12-12-2009

Originally posted by: Hill_Groove hazeman,

thank you for such a detailed review. And what can you conclude? Is CUDA better today? I mean CAL needs IL-kernels, and they are hard to write and OpenCL is not ready yet =/

CUDA/OpenCL from NVidia is much more mature than ATI's solutions. It has better documentation and CUDA is much easier to use than CAL ( of course CAL is almost assembler , so it must be harder ). But NVidia has slower hardware.

So I really don't know what you should do. I can tell you what I'm doing. At the moment I'm using CAL to write some code that must be done ASAP and it's targeted for 48xx family. Fortunatelly it's not too complicated.

With more advanced coding i'm waiting for few months. By then Fermi will be available and it might be the best choice.

Also I don't see the possibility to code something bigger in CAL. So probably the choice will be OCL on Fermi or OCL on 58xx ( if ATI will fix it by then ).

Hill_Groove · ‎12-13-2009

hazeman,

Speaking about ATI, their cards are also cheaper. I think Fermi will cost min 1500$. So, it would be much better "if ATI will fix it by then". Thank You for discussion.

hazeman · ‎12-14-2009

Originally posted by: Hill_Groove

I think Fermi will cost min 1500$.

Tesla version of Fermi ( with ECC and such ) will probably cost as much. But normal version ( for gamers ) will be priced close to 5870.

So, it would be much better "if ATI will fix it by then".

If you are interested in best value for money then 48xx series is winner here ( best GFLOPS/price ). Unfortunately ATI already said that they are not going to fix OpenCL for 4xxx.

New cards like 58xx & Fermi won't be in the BVFM spot for a long time ( for sure more than a half year ). In such case I think it's better to switch to NVidia's solutions ( with regard to OCL ).

riza_guntur · ‎12-14-2009

i wonder if someone using dx11 directcompute for hd5xxx rather than talking about opencl...

nou · ‎12-15-2009

DC vs OCL http://www.ngohq.com/news/16861-directcompute-benchmark-0-35-a.html

author of bench state that both version of bench run the same code.

here is newer version. http://www.ngohq.com/graphic-cards/16920-directcompute-and-opencl-benchmark.html

i wonder if there is still 2x better score for OCL as was in 0.35 when testing radeon 5xxx. on nVidia card is OCL better about 20% than DC.

can someone with 5xxx try and run?

ryta1203 · ‎12-15-2009

Originally posted by: hazeman Quick answer is yes ( OpenCL is written on top of CAL, so it can't be faster ). Full answer is a little bit longer.

On the 4xxx family with CAL you can get almost full power of the card. But you should be warned - it will be rather painfull. Documentation is really bad or missing ( with regard to optimization ) and compiler is sometimes doing strange things ( so you need it to trick it to get quality code ). On the other hand OpenCL for 4xxx is reaalllyyy bad ( lacking cached memory access and LDS ) - it's about 3x slower than Brook+.
With 5xxx family it's hard to say. There are some results suggesting ( search streamsdk forum ) that there is problem with memory transfer speed ( we will se if new CAL version will corect it ). So with exception of memory transfer you can get almost full power of 5xxx with CAL.
OpenCL on 5xxx is again a problem. In theory OpenCL on 5xxx should work like a charm ( it doesn't miss LDS, new memory access instructions ) but results are not supporting it ( maybe again problems with memory - who knows ). At the moment performance for some applications is comparable to OpenCL on 8800GT.

The problem here is that AMD/ATI has already said that it's going to start exposing NEW features in OpenCL FIRST and then, MAYBE, CAL/IL....

...so in the future you might find that your OpenCL code runs faster than your CAL/IL code. It's possible.

kunzjacq2 · ‎12-27-2009

It seems there were some bugs in stream 2.0 beta 4 causing performance problems when transferring data between host and device :

http://www.sisoftware.net/index.html?dir=qa&location=gpu_opencl&langx=en&a=

Does anyone know if this is fixed in the final version?

n0thing · ‎12-27-2009

I don't know about 7xx but on 5870 the internal bandwidth is now close to 100GB/s in OpenCL (vec4 write performance) and PCIE bus transfer speeds are around 2.3 GB/s.

kunzjacq2 · ‎12-27-2009

Originally posted by: n0thing I don't know about 7xx but on 5870 the internal bandwidth is now close to 100GB/s in OpenCL (vec4 write performance) and PCIE bus transfer speeds are around 2.3 GB/s.

Thanks for the info. According to the Sisoft results, that would mean that OpenCL PCIE transfers are still way behind CAL ones (2.3 vs 4.4GB/s).

Some other figures I got: on a NVidia GTX 285, one gets 4.2GB/s PCIe speed in OpenCL. On an 5770, with beta 4 I got 1.5GB/s, and I currently don't have the figure with final version.

n0thing · ‎12-28-2009

I think the CAL version uses pinned memory to transfer data over PCIE.

I have tried using CL_MEM_ALLOC_HOST_PTR flag but it doesn't use pinned memory on ATI GPUs as the transfer bandwidth remains the same i.e. 2.3GB/s.

Archives Discussions

OpenCL vs CAL