AnsweredAssumed Answered

Using CL_MEM_ALLOC_HOST_PTR on buffer for writing output (BufferBandwidth SDK sample)

Question asked by ekondis on Apr 19, 2014
Latest reply on Apr 23, 2014 by ekondis

Hello,

I'm experimenting on using host allocated memory buffers for reading or writing data. Specifically I use the BufferBandwidth sample provided with the SDK under 64bit Linux on a HD7750 GPU. If I define small enough buffers the program runs as expected. However, if I use somewhat larger array data the output buffer seems not to be host allocated any more.

 

Here is the output of array size 130000000 where the output seems normal:

 

 

./BufferBandwidth -if 0 -if 5 -of 1 -of 5 -db -nb 130000000
Platform found : Advanced Micro Devices, Inc.




Device  0            Capeverde
Build:               release
GPU work items:      290176
Buffer size:         129998848
CPU workers:         1
Timing loops:        20
Repeats:             1
Kernel loops:        20
inputBuffer:         CL_MEM_READ_ONLY CL_MEM_ALLOC_HOST_PTR 
outputBuffer:        CL_MEM_WRITE_ONLY CL_MEM_ALLOC_HOST_PTR 




AVERAGES (over loops 2 - 19, use -l for complete log)
--------




1. Host mapped write to inputBuffer
 ---------------------------------------|---------------
 clEnqueueMapBuffer -- WRITE (GBPS)     | 7.27e+03
 ---------------------------------------|---------------
 memset() (GBPS)                        | 4.29
 ---------------------------------------|---------------
 clEnqueueUnmapMemObject() (GBPS)       | 6.94e+03




2. GPU kernel read of inputBuffer
 ---------------------------------------|---------------
 clEnqueueNDRangeKernel() (GBPS)        | 3.03


 Verification Passed!




3. GPU kernel write to outputBuffer
 ---------------------------------------|---------------
 clEnqueueNDRangeKernel() (GBPS)        | 1.77




4. Host mapped read of outputBuffer
 ---------------------------------------|---------------
 clEnqueueMapBuffer -- READ (GBPS)      | 7.29e+03
 ---------------------------------------|---------------
 CPU read (GBPS)                        | 4.03
 ---------------------------------------|---------------
 clEnqueueUnmapMemObject() (GBPS)       | 5.97e+03


 Verification Passed!

 

And here is the output when using array size 134000000 where the output buffer seems to be allocated in device memory (high device writing bandwidth, slow when mapped by CPU):

 

 

./BufferBandwidth -if 0 -if 5 -of 1 -of 5 -db -nb 134000000
Platform found : Advanced Micro Devices, Inc.




Device  0            Capeverde
Build:               release
GPU work items:      11648
Buffer size:         133998592
CPU workers:         1
Timing loops:        20
Repeats:             1
Kernel loops:        20
inputBuffer:         CL_MEM_READ_ONLY CL_MEM_ALLOC_HOST_PTR 
outputBuffer:        CL_MEM_WRITE_ONLY CL_MEM_ALLOC_HOST_PTR 




AVERAGES (over loops 2 - 19, use -l for complete log)
--------




1. Host mapped write to inputBuffer
 ---------------------------------------|---------------
 clEnqueueMapBuffer -- WRITE (GBPS)     | 6.25e+03
 ---------------------------------------|---------------
 memset() (GBPS)                        | 4.29
 ---------------------------------------|---------------
 clEnqueueUnmapMemObject() (GBPS)       | 7.13e+03




2. GPU kernel read of inputBuffer
 ---------------------------------------|---------------
 clEnqueueNDRangeKernel() (GBPS)        | 3.02


 Verification Passed!




3. GPU kernel write to outputBuffer
 ---------------------------------------|---------------
 clEnqueueNDRangeKernel() (GBPS)        | 31.5




4. Host mapped read of outputBuffer
 ---------------------------------------|---------------
 clEnqueueMapBuffer -- READ (GBPS)      | 2.46
 ---------------------------------------|---------------
 CPU read (GBPS)                        | 4.03
 ---------------------------------------|---------------
 clEnqueueUnmapMemObject() (GBPS)       | 5.39e+03


 Verification Passed!




Passed!

Outcomes