12 Replies Latest reply on Dec 28, 2009 2:02 AM by n0thing

    OpenCL vs CAL

    Hill_Groove
      speed

      Hello

      Is there any CAL perfomance advantages over OpenCL?

        • OpenCL vs CAL
          hazeman

          Quick answer is yes ( OpenCL is written on top of CAL, so it can't be faster ). Full answer is a little bit longer.

          On the 4xxx family with CAL you can get almost full power of the card. But you should be warned - it will be rather painfull. Documentation is really bad or missing ( with regard to optimization ) and compiler is sometimes doing strange things ( so you need it to trick it to get quality code ). On the other hand OpenCL for 4xxx is reaalllyyy bad ( lacking cached memory access and LDS ) - it's about 3x slower than Brook+.

          With 5xxx family it's hard to say. There are some results suggesting ( search streamsdk forum ) that there is problem with memory transfer speed ( we will se if new CAL version will corect it ). So with exception of memory transfer you can get almost full power of 5xxx with CAL.

          OpenCL on 5xxx is again a problem. In theory OpenCL on 5xxx should work like a charm ( it doesn't miss LDS, new memory access instructions ) but results are not supporting it ( maybe again problems with memory - who knows ). At the moment performance for some applications is comparable to OpenCL on 8800GT.

           

            • OpenCL vs CAL
              Hill_Groove

              hazeman,

              thank you for such a detailed review. And what can you conclude? Is CUDA better today? I mean CAL needs IL-kernels, and they are hard to write and OpenCL is not ready yet =/

               

                • OpenCL vs CAL
                  hazeman

                   

                  Originally posted by: Hill_Groove hazeman,

                   

                  thank you for such a detailed review. And what can you conclude? Is CUDA better today? I mean CAL needs IL-kernels, and they are hard to write and OpenCL is not ready yet =/

                   

                   



                  CUDA/OpenCL from NVidia is much more mature than ATI's solutions. It has better documentation and CUDA is much easier to use than CAL ( of course CAL is almost assembler , so it must be harder ). But NVidia has slower hardware.

                  So I really don't know what you should do. I can tell you what I'm doing. At the moment I'm using CAL to write some code that must be done ASAP and it's targeted for 48xx family. Fortunatelly it's not too complicated.

                  With more advanced coding i'm waiting for few months. By then Fermi will be available and it might be the best choice.

                  Also I don't see the possibility to code something bigger in CAL. So probably the choice will be OCL on Fermi or OCL on 58xx ( if ATI will fix it by then ).

                   

                   

                • OpenCL vs CAL
                  ryta1203

                   

                  Originally posted by: hazeman Quick answer is yes ( OpenCL is written on top of CAL, so it can't be faster ). Full answer is a little bit longer.

                  On the 4xxx family with CAL you can get almost full power of the card. But you should be warned - it will be rather painfull. Documentation is really bad or missing ( with regard to optimization ) and compiler is sometimes doing strange things ( so you need it to trick it to get quality code ). On the other hand OpenCL for 4xxx is reaalllyyy bad ( lacking cached memory access and LDS ) - it's about 3x slower than Brook+.

                  With 5xxx family it's hard to say. There are some results suggesting ( search streamsdk forum ) that there is problem with memory transfer speed ( we will se if new CAL version will corect it ). So with exception of memory transfer you can get almost full power of 5xxx with CAL.

                  OpenCL on 5xxx is again a problem. In theory OpenCL on 5xxx should work like a charm ( it doesn't miss LDS, new memory access instructions ) but results are not supporting it ( maybe again problems with memory - who knows ). At the moment performance for some applications is comparable to OpenCL on 8800GT.

                   

                   

                  The problem here is that AMD/ATI has already said that it's going to start exposing NEW features in OpenCL FIRST and then, MAYBE, CAL/IL....

                  ...so in the future you might find that your OpenCL code runs faster than your CAL/IL code. It's possible.

                    • OpenCL vs CAL
                      kunzjacq2

                      It seems there were some bugs in stream 2.0 beta 4 causing performance problems when transferring data between host and device :

                       

                      http://www.sisoftware.net/index.html?dir=qa&location=gpu_opencl&langx=en&a=

                      Does anyone know if this is fixed in the final version?

                       

                        • OpenCL vs CAL
                          n0thing

                          I don't know about 7xx but on 5870 the internal bandwidth is now close to 100GB/s  in OpenCL (vec4 write performance) and PCIE bus transfer speeds are around 2.3 GB/s.

                            • OpenCL vs CAL
                              kunzjacq2

                               

                              Originally posted by: n0thing I don't know about 7xx but on 5870 the internal bandwidth is now close to 100GB/s  in OpenCL (vec4 write performance) and PCIE bus transfer speeds are around 2.3 GB/s.

                               

                               

                              Thanks for the info. According to the Sisoft results, that would mean that OpenCL PCIE transfers are still way behind CAL ones (2.3 vs 4.4GB/s).

                              Some other figures I got: on a NVidia GTX 285, one gets 4.2GB/s PCIe speed in OpenCL. On an 5770, with beta 4 I got 1.5GB/s, and I currently don't have the figure with final version.

                                • OpenCL vs CAL
                                  n0thing

                                  I think the CAL version uses pinned memory to transfer data over PCIE.

                                  I have tried using CL_MEM_ALLOC_HOST_PTR flag but it doesn't use pinned memory on ATI GPUs as the transfer bandwidth remains the same i.e. 2.3GB/s.