7 Replies Latest reply on Jul 30, 2010 6:01 AM by pedela

    poor transfer

    stefan_w

      If I execute NVIDIA's oclBandWidth Test with my 5970 I achieve very poor results compared to a NVIDIA GTX 260:

       

      5970:

      oclBandwidthTest.exe Starting...

      WARNING: NVIDIA OpenCL platform not found - defaulting to first platform!

      Running on...

      Device Cypress
      Quick Mode

      Host to Device Bandwidth, 1 Device(s), Paged memory, direct access
      Transfer Size (Bytes) Bandwidth(MB/s)
      33554432 1575.1

      Device to Host Bandwidth, 1 Device(s), Paged memory, direct access
      Transfer Size (Bytes) Bandwidth(MB/s)
      33554432 2454.8

      Device to Device Bandwidth, 1 Device(s)
      Transfer Size (Bytes) Bandwidth(MB/s)
      33554432 49974.0

       

      NVIDIA GTX 260
      ./oclBandwidthTest Starting...

      Running on...

      Device GeForce GTX 260
      Quick Mode

      Host to Device Bandwidth, 1 Device(s), Paged memory, direct access
      Transfer Size (Bytes) Bandwidth(MB/s)
      33554432 5251.1

      Device to Host Bandwidth, 1 Device(s), Paged memory, direct access
      Transfer Size (Bytes) Bandwidth(MB/s)
      33554432 5256.2

      Device to Device Bandwidth, 1 Device(s)
      Transfer Size (Bytes) Bandwidth(MB/s)
      33554432 91350.3


      TEST PASSED

       

       

        • poor transfer
          genaganna

           

          Originally posted by: stefan_w If I execute NVIDIA's oclBandWidth Test with my 5970 I achieve very poor results compared to a NVIDIA GTX 260:

           

          5970:

           

          oclBandwidthTest.exe Starting...

           

          WARNING: NVIDIA OpenCL platform not found - defaulting to first platform!

           

          Running on...

           

          Device Cypress Quick Mode

           

          Host to Device Bandwidth, 1 Device(s), Paged memory, direct access Transfer Size (Bytes) Bandwidth(MB/s) 33554432 1575.1

           

          Device to Host Bandwidth, 1 Device(s), Paged memory, direct access Transfer Size (Bytes) Bandwidth(MB/s) 33554432 2454.8

           

          Device to Device Bandwidth, 1 Device(s) Transfer Size (Bytes) Bandwidth(MB/s) 33554432 49974.0

           

           

          NVIDIA GTX 260 ./oclBandwidthTest Starting...

           

          Running on...

           

          Device GeForce GTX 260 Quick Mode

           

          Host to Device Bandwidth, 1 Device(s), Paged memory, direct access Transfer Size (Bytes) Bandwidth(MB/s) 33554432 5251.1

           

          Device to Host Bandwidth, 1 Device(s), Paged memory, direct access Transfer Size (Bytes) Bandwidth(MB/s) 33554432 5256.2

           

          Device to Device Bandwidth, 1 Device(s) Transfer Size (Bytes) Bandwidth(MB/s) 33554432 91350.3

           

          TEST PASSED

           

           

          These are known issues. These will be addressed in upcoming releases.

            • poor transfer
              stefan_w

              Are there any plans to support page locked/pinned memory (like NVIDIA does)?

                • poor transferi
                  nou

                  i think clEnqueueMapBuffer() use pinned memory.

                    • poor transfer
                      stefan_w

                      I use

                      host_mem = clCreateBuffer(context,
                                                  CL_MEM_ALLOC_HOST_PTR | CL_MEM_READ_WRITE,
                                                  size,NULL,&ocl_err);
                      *ptr = (void*)clEnqueueMapBuffer(cmd_queue,host_mem,
                                                         CL_TRUE,CL_MAP_READ|CL_MAP_WRITE,
                                                         0,size,0,NULL,&evt,&ocl_err);

                      to create page locked memory using the NVIDIA driver, where it works fine. However on my AMD card this makes no difference to malloced memory.

                        • poor transfer
                          genaganna

                           

                          Originally posted by: stefan_w I use

                           

                          host_mem = clCreateBuffer(context,                             CL_MEM_ALLOC_HOST_PTR | CL_MEM_READ_WRITE,                             size,NULL,&ocl_err); *ptr = (void*)clEnqueueMapBuffer(cmd_queue,host_mem,                                    CL_TRUE,CL_MAP_READ|CL_MAP_WRITE,                                    0,size,0,NULL,&evt,&ocl_err);

                           

                          to create page locked memory using the NVIDIA driver, where it works fine. However on my AMD card this makes no difference to malloced memory.

                           

                          Uses of Pinned memory is not implemented yet. Pinned memory expected to be introduced in upcoming releases.

                    • poor transfer
                      exihea

                      I still got low PCIe transfer rate. Same openCL code, I got 5-6 GB/s with GTX280 but less than 2 GB/s with 5870. How can I improve the numbers with the Cypress?

                      $ ./oclBandwidthTestGeneric
                      Using device 1: GeForce GTX 280
                      D2H Bandwidth =5.52 GB/s
                      H2D Bandwidth =5.34 GB/s

                      $ ./oclBandwidthTestGeneric
                      Using device 1: Cypress
                      D2H Bandwidth =1.35 GB/s
                      H2D Bandwidth =0.49 GB/s

                      I am using Stream SDK 2.1

                       

                        • poor transfer
                          pedela

                          If you set oclBandwidthTest to use pinned and mapped memory, that achieves near theoretical results on an ATI 4870 and 5670. On NVIDIA 260, I get near theoretical results using pinned and direct memory settings.