AnsweredAssumed Answered

SVM physically shared or just virtualy shared?

Question asked by ekondis on Aug 29, 2015
Latest reply on Sep 4, 2015 by ekondis

I run the BufferBandwidth example from the AMD APP SDK 3.0 on a A6-1450 APU. The results follow:

Platform 0 : Advanced Micro Devices, Inc.

Platform found : Advanced Micro Devices, Inc.

 

 

Selected Platform Vendor : Advanced Micro Devices, Inc.

Device 0 : Kalindi Device ID is 0x232d320

Build Options are : -cl-std=CL2.0

Build:               release

GPU work items:      2048

Buffer size:         33554432

CPU workers:         1

Timing loops:        20

Repeats:             1

Kernel loops:        20

inputBuffer:         CL_MEM_READ_ONLY

outputBuffer:        CL_MEM_WRITE_ONLY

inputSVMBuffer:      CL_MEM_READ_ONLY

outputSVMBuffer:     CL_MEM_WRITE_ONLY

 

 

Host baseline (naive):

 

 

Timer resolution     1004.72 ns

Page fault           3564.09 ns

CPU read             2.83 GB/s

memcpy()             1.44 GB/s

memset(,1,)          1.30 GB/s

memset(,0,)          1.29 GB/s

 

 

 

 

AVERAGES (over loops 2 - 19, use -l for complete log)

--------

 

 

 

 

1. Host mapped write to inputBuffer

---------------------------------------|---------------

clEnqueueMapBuffer -- WRITE (GBPS)     | 1024.390

---------------------------------------|---------------

memset() (GBPS)                        | 2.401

---------------------------------------|---------------

clEnqueueUnmapMemObject() (GBPS)       | 2.967

 

 

 

 

2. GPU kernel read of inputBuffer

---------------------------------------|---------------

clEnqueueNDRangeKernel() (secs)        | 0.092

 

 

Verification Passed!

 

 

 

 

3. GPU kernel write to outputBuffer

---------------------------------------|---------------

clEnqueueNDRangeKernel() (secs)        | 0.146

 

 

 

 

4. Host mapped read of outputBuffer

---------------------------------------|---------------

clEnqueueMapBuffer -- READ (GBPS)      | 2.555

---------------------------------------|---------------

CPU read (GBPS)                        | 3.290

---------------------------------------|---------------

clEnqueueUnmapMemObject() (GBPS)       | 461.033

 

 

Verification Passed!

 

 

 

 

5. Host mapped write to inputSVMBuffer

---------------------------------------|---------------

clEnqueueSVMMap -- WRITE (GBPS)        | 1935.760

---------------------------------------|---------------

memset() (GBPS)                        | 2.435

---------------------------------------|---------------

clEnqueueSVMUnmap() (GBPS)             | 2.967

 

 

 

 

6. GPU kernel execution using inputSVMBuffer

---------------------------------------|---------------

clEnqueueNDRangeKernel() (secs)        | 0.092

 

 

Verification Passed!

 

 

 

 

7. GPU kernel write to outputSVMBuffer

---------------------------------------|---------------

clEnqueueNDRangeKernel() (secs)        | 0.146

 

 

 

 

8. Host mapped read of outputSVMBuffer

---------------------------------------|---------------

clEnqueueSVMMap -- READ (GBPS)         | 2.547

---------------------------------------|---------------

CPU read (GBPS)                        | 3.427

---------------------------------------|---------------

clEnqueueSVMUnmap() (GBPS)             | 479.160

 

 

Verification Passed!

 

 

 

 

Passed!

 

I have a question regarding the SVM. I noticed that the memory mapping operations tend to copy data from device to host memory or vice versa when mapped for reading or when upmapped for writing respectively. The numbers in GB/sec of the map and unmap operations lead to this conclusion. While this seems reasonable I find it inappropriate for a memory buffer that it is considered as shared virtual memory (SVM). It meets the standard but when one uses an SVM buffer he expects that the physical place of memory is either residing in device or in host memory. I really don't like the fact that memory mapping or unmapping of SVM costs so much. For instance, consider a case where one has to perform a binary search on a tree that it is stored in an SVM buffer. The map operation would copy the entire buffer where only a tiny part of it (some nodes) would actually be needed. And you cannot know in advance which addresses would be needed to be accessed. This would be very inefficient.

Outcomes