2 Replies Latest reply on Oct 2, 2012 3:19 PM by thejascr

    TransferOverlap not working on Firepro v4800 SDK 2.6 Linux 3.0

    thejascr

      Hi,

       

      I have a AMD Firepro v4800 and AMD APP SDK 2.6 on linux kernel 3.0.

      I am experimenting with the TransferOverlap program of the APP SDK.

      But it is not working for me. I see the same running time with both overlap on and off.

      I have also run the APP profiler and even there I do not see any overlap.

       

      Here is the program output. (

      =======================================================================================

       

      thejascr@amd-fusion:/opt/bk-AMDAPP/samples/opencl/bin/x86_64$ ./TransferOverlap -d 1 -x 1000 -k 100000 -I 3               // (with overlap)

      Platform 0 : Advanced Micro Devices, Inc.

      Platform found : Advanced Micro Devices, Inc.

      Selected Platform Vendor : Advanced Micro Devices, Inc.

      Device 0 : BeaverCreek Device ID is 0x2652660

      Device 1 : Redwood Device ID is 0x2811450

      Device 2 : Redwood Device ID is 0x2a573a0

      Build:               DEBUG

      GPU work items:        64

      Buffer size:           1024

      Timing loops:          50

      Kernel loops:          100000

      Wavefronts/SIMD:       7

      memset/kernel overlap: yes

      inputBuffer:           CL_MEM_READ_ONLYCL_MEM_ALLOC_HOST_PTR

       

       

      AVERAGES (over loops3 - 49, use -l to show complete log)

      ------------------------------------------------------------------------------------------

      Acquire and fill: inputBuffer1

            clWaitForEvents() + memset()  0.000004 s     0.27 GB/s

             clEnqueueUnmapMemObject()  0.154325 s     0.00 GB/s

        Loop time 0.309886 s

        Launch map: inputBuffer2

           clEnqueueMapBuffer(MAP_WRITE)  0.000025 s     0.04 GB/s

        Verify: resultBuffer2

           clEnqueueMapBuffer(MAP_WRITE)  0.000296 s     0.00 GB/s

                         CPU reduction  0.000001 s

                       verification ok

             clEnqueueUnmapMemObject()  0.000147 s     0.00 GB/s

        Launch GPU kernel: inputBuffer1

                clEnqueueNDRangeKernel()  0.000029 s

        Acquire and fill: inputBuffer2

            clWaitForEvents() + memset()  0.000004 s     0.27 GB/s

             clEnqueueUnmapMemObject()  0.154495 s     0.00 GB/s

        Launch map: inputBuffer1

           clEnqueueMapBuffer(MAP_WRITE)  0.000026 s     0.04 GB/s

        Verify: resultBuffer1

           clEnqueueMapBuffer(MAP_WRITE)  0.000301 s     0.00 GB/s

                         CPU reduction  0.000001 s

                       verification ok

             clEnqueueUnmapMemObject()  0.000145 s     0.00 GB/s

        Launch GPU kernel: inputBuffer2

                clEnqueueNDRangeKernel()  0.000028 s

       

      Complete test time:15.5218 s

       

      ====================================================================================


      thejascr@amd-fusion:/opt/bk-AMDAPP/samples/opencl/bin/x86_64$ ./TransferOverlap -d 1 -x 1000 -k 100000 -I 3 -n             // (without overlap)

      Platform 0 : Advanced Micro Devices, Inc.

      Platform found : Advanced Micro Devices, Inc.

      Selected Platform Vendor : Advanced Micro Devices, Inc.

      Device 0 : BeaverCreek Device ID is 0x1b20660

      Device 1 : Redwood Device ID is 0x1bdca50

      Device 2 : Redwood Device ID is 0x1bdbeb0

      Build:               DEBUG

      GPU work items:        64

      Buffer size:           1024

      Timing loops:          50

      Kernel loops:          100000

      Wavefronts/SIMD:       7

      memset/kernel overlap: no

      inputBuffer:           CL_MEM_READ_ONLYCL_MEM_ALLOC_HOST_PTR

       

      AVERAGES (over loops3 - 49, use -l to show complete log)

      --------------------------------------------------------------------------------------

      Acquire and fill: inputBuffer1

            clWaitForEvents() + memset()  0.000004 s     0.27 GB/s

             clEnqueueUnmapMemObject()  0.000282 s     0.00 GB/s

        Loop time 0.310473 s

        Launch map: inputBuffer2

           clEnqueueMapBuffer(MAP_WRITE)  0.000022 s     0.05 GB/s

        Verify: resultBuffer2

           clEnqueueMapBuffer(MAP_WRITE)  0.000278 s     0.00 GB/s

                         CPU reduction  0.000001 s

                       verification ok

             clEnqueueUnmapMemObject()  0.000140 s     0.00 GB/s

        Launch GPU kernel: inputBuffer1

               clEnqueueNDRangeKernel()  0.154493 s

        Acquire and fill: inputBuffer2

            clWaitForEvents() + memset()  0.000004 s     0.25 GB/s

             clEnqueueUnmapMemObject()  0.000286 s     0.00 GB/s

        Launch map: inputBuffer1

           clEnqueueMapBuffer(MAP_WRITE)  0.000023 s     0.05 GB/s

        Verify: resultBuffer1

           clEnqueueMapBuffer(MAP_WRITE)  0.000284 s     0.00 GB/s

                         CPU reduction  0.000001 s

                       verification ok

             clEnqueueUnmapMemObject()  0.000140 s     0.00 GB/s

        Launch GPU kernel: inputBuffer2

               clEnqueueNDRangeKernel()  0.154432 s

          Complete test time:15.5523 s