4 Replies Latest reply on Sep 4, 2015 10:38 AM by ekondis

    SVM physically shared or just virtualy shared?

    ekondis

      I run the BufferBandwidth example from the AMD APP SDK 3.0 on a A6-1450 APU. The results follow:

      Platform 0 : Advanced Micro Devices, Inc.

      Platform found : Advanced Micro Devices, Inc.

       

       

      Selected Platform Vendor : Advanced Micro Devices, Inc.

      Device 0 : Kalindi Device ID is 0x232d320

      Build Options are : -cl-std=CL2.0

      Build:               release

      GPU work items:      2048

      Buffer size:         33554432

      CPU workers:         1

      Timing loops:        20

      Repeats:             1

      Kernel loops:        20

      inputBuffer:         CL_MEM_READ_ONLY

      outputBuffer:        CL_MEM_WRITE_ONLY

      inputSVMBuffer:      CL_MEM_READ_ONLY

      outputSVMBuffer:     CL_MEM_WRITE_ONLY

       

       

      Host baseline (naive):

       

       

      Timer resolution     1004.72 ns

      Page fault           3564.09 ns

      CPU read             2.83 GB/s

      memcpy()             1.44 GB/s

      memset(,1,)          1.30 GB/s

      memset(,0,)          1.29 GB/s

       

       

       

       

      AVERAGES (over loops 2 - 19, use -l for complete log)

      --------

       

       

       

       

      1. Host mapped write to inputBuffer

      ---------------------------------------|---------------

      clEnqueueMapBuffer -- WRITE (GBPS)     | 1024.390

      ---------------------------------------|---------------

      memset() (GBPS)                        | 2.401

      ---------------------------------------|---------------

      clEnqueueUnmapMemObject() (GBPS)       | 2.967

       

       

       

       

      2. GPU kernel read of inputBuffer

      ---------------------------------------|---------------

      clEnqueueNDRangeKernel() (secs)        | 0.092

       

       

      Verification Passed!

       

       

       

       

      3. GPU kernel write to outputBuffer

      ---------------------------------------|---------------

      clEnqueueNDRangeKernel() (secs)        | 0.146

       

       

       

       

      4. Host mapped read of outputBuffer

      ---------------------------------------|---------------

      clEnqueueMapBuffer -- READ (GBPS)      | 2.555

      ---------------------------------------|---------------

      CPU read (GBPS)                        | 3.290

      ---------------------------------------|---------------

      clEnqueueUnmapMemObject() (GBPS)       | 461.033

       

       

      Verification Passed!

       

       

       

       

      5. Host mapped write to inputSVMBuffer

      ---------------------------------------|---------------

      clEnqueueSVMMap -- WRITE (GBPS)        | 1935.760

      ---------------------------------------|---------------

      memset() (GBPS)                        | 2.435

      ---------------------------------------|---------------

      clEnqueueSVMUnmap() (GBPS)             | 2.967

       

       

       

       

      6. GPU kernel execution using inputSVMBuffer

      ---------------------------------------|---------------

      clEnqueueNDRangeKernel() (secs)        | 0.092

       

       

      Verification Passed!

       

       

       

       

      7. GPU kernel write to outputSVMBuffer

      ---------------------------------------|---------------

      clEnqueueNDRangeKernel() (secs)        | 0.146

       

       

       

       

      8. Host mapped read of outputSVMBuffer

      ---------------------------------------|---------------

      clEnqueueSVMMap -- READ (GBPS)         | 2.547

      ---------------------------------------|---------------

      CPU read (GBPS)                        | 3.427

      ---------------------------------------|---------------

      clEnqueueSVMUnmap() (GBPS)             | 479.160

       

       

      Verification Passed!

       

       

       

       

      Passed!

       

      I have a question regarding the SVM. I noticed that the memory mapping operations tend to copy data from device to host memory or vice versa when mapped for reading or when upmapped for writing respectively. The numbers in GB/sec of the map and unmap operations lead to this conclusion. While this seems reasonable I find it inappropriate for a memory buffer that it is considered as shared virtual memory (SVM). It meets the standard but when one uses an SVM buffer he expects that the physical place of memory is either residing in device or in host memory. I really don't like the fact that memory mapping or unmapping of SVM costs so much. For instance, consider a case where one has to perform a binary search on a tree that it is stored in an SVM buffer. The map operation would copy the entire buffer where only a tiny part of it (some nodes) would actually be needed. And you cannot know in advance which addresses would be needed to be accessed. This would be very inefficient.