cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

Highlighted
Adept II
Adept II

Using CL_MEM_ALLOC_HOST_PTR on buffer for writing output (BufferBandwidth SDK sample)

Hello,

I'm experimenting on using host allocated memory buffers for reading or writing data. Specifically I use the BufferBandwidth sample provided with the SDK under 64bit Linux on a HD7750 GPU. If I define small enough buffers the program runs as expected. However, if I use somewhat larger array data the output buffer seems not to be host allocated any more.

Here is the output of array size 130000000 where the output seems normal:


./BufferBandwidth -if 0 -if 5 -of 1 -of 5 -db -nb 130000000


Platform found : Advanced Micro Devices, Inc.






Device  0            Capeverde


Build:               release


GPU work items:      290176


Buffer size:         129998848


CPU workers:         1


Timing loops:        20


Repeats:             1


Kernel loops:        20


inputBuffer:         CL_MEM_READ_ONLY CL_MEM_ALLOC_HOST_PTR


outputBuffer:        CL_MEM_WRITE_ONLY CL_MEM_ALLOC_HOST_PTR






AVERAGES (over loops 2 - 19, use -l for complete log)


--------






1. Host mapped write to inputBuffer


---------------------------------------|---------------


clEnqueueMapBuffer -- WRITE (GBPS)     | 7.27e+03


---------------------------------------|---------------


memset() (GBPS)                        | 4.29


---------------------------------------|---------------


clEnqueueUnmapMemObject() (GBPS)       | 6.94e+03






2. GPU kernel read of inputBuffer


---------------------------------------|---------------


clEnqueueNDRangeKernel() (GBPS)        | 3.03




Verification Passed!






3. GPU kernel write to outputBuffer


---------------------------------------|---------------


clEnqueueNDRangeKernel() (GBPS)        | 1.77






4. Host mapped read of outputBuffer


---------------------------------------|---------------


clEnqueueMapBuffer -- READ (GBPS)      | 7.29e+03


---------------------------------------|---------------


CPU read (GBPS)                        | 4.03


---------------------------------------|---------------


clEnqueueUnmapMemObject() (GBPS)       | 5.97e+03




Verification Passed!


And here is the output when using array size 134000000 where the output buffer seems to be allocated in device memory (high device writing bandwidth, slow when mapped by CPU):


./BufferBandwidth -if 0 -if 5 -of 1 -of 5 -db -nb 134000000


Platform found : Advanced Micro Devices, Inc.






Device  0            Capeverde


Build:               release


GPU work items:      11648


Buffer size:         133998592


CPU workers:         1


Timing loops:        20


Repeats:             1


Kernel loops:        20


inputBuffer:         CL_MEM_READ_ONLY CL_MEM_ALLOC_HOST_PTR


outputBuffer:        CL_MEM_WRITE_ONLY CL_MEM_ALLOC_HOST_PTR






AVERAGES (over loops 2 - 19, use -l for complete log)


--------






1. Host mapped write to inputBuffer


---------------------------------------|---------------


clEnqueueMapBuffer -- WRITE (GBPS)     | 6.25e+03


---------------------------------------|---------------


memset() (GBPS)                        | 4.29


---------------------------------------|---------------


clEnqueueUnmapMemObject() (GBPS)       | 7.13e+03






2. GPU kernel read of inputBuffer


---------------------------------------|---------------


clEnqueueNDRangeKernel() (GBPS)        | 3.02




Verification Passed!






3. GPU kernel write to outputBuffer


---------------------------------------|---------------


clEnqueueNDRangeKernel() (GBPS)        | 31.5






4. Host mapped read of outputBuffer


---------------------------------------|---------------


clEnqueueMapBuffer -- READ (GBPS)      | 2.46


---------------------------------------|---------------


CPU read (GBPS)                        | 4.03


---------------------------------------|---------------


clEnqueueUnmapMemObject() (GBPS)       | 5.39e+03




Verification Passed!






Passed!


0 Kudos
Reply
2 Replies
Highlighted
Staff
Staff

Re: Using CL_MEM_ALLOC_HOST_PTR on buffer for writing output (BufferBandwidth SDK sample)

Hi,

I have gone through AMD's OpenCL memory management documentation as well as BufferBandwidth code. I found no apparent reason why you should be getting such results.

I ran the code on my side (Radeon 7800 series) with your input. I am not able to reproduce the error you are getting. I am attaching the snap-shots of the experiment. (test_small.txt is for buffer size 130000000 and test_big.txt is for buffer size 134000000).

Can you rerun it with latest drivers? Also can you gradually increase buffer size and find where is the fault line?

0 Kudos
Reply
Highlighted
Adept II
Adept II

Re: Re: Using CL_MEM_ALLOC_HOST_PTR on buffer for writing output (BufferBandwidth SDK sample)

Ok, after I had to reinstall a fresh linux distribution (for other reasons) I installed the latest Catalyst 14.4 RC and the problem persists. I performed a binary search and I found that the problem is evident for buffer size 131135488 whereas for 131135487 it is not. Here are the results:


./BufferBandwidth -if 0 -if 5 -of 1 -of 5 -db -nb 131135487


Platform found : Advanced Micro Devices, Inc.






Device  0            Capeverde


Build:               release


GPU work items:      12160


Buffer size:         131133440


CPU workers:         1


Timing loops:        20


Repeats:             1


Kernel loops:        20


inputBuffer:         CL_MEM_READ_ONLY CL_MEM_ALLOC_HOST_PTR


outputBuffer:        CL_MEM_WRITE_ONLY CL_MEM_ALLOC_HOST_PTR






AVERAGES (over loops 2 - 19, use -l for complete log)


--------






1. Host mapped write to inputBuffer


---------------------------------------|---------------


clEnqueueMapBuffer -- WRITE (GBPS)     | 1.18e+04


---------------------------------------|---------------


memset() (GBPS)                        | 4.3


---------------------------------------|---------------


clEnqueueUnmapMemObject() (GBPS)       | 6.91e+03






2. GPU kernel read of inputBuffer


---------------------------------------|---------------


clEnqueueNDRangeKernel() (GBPS)        | 3.03




Verification Passed!






3. GPU kernel write to outputBuffer


---------------------------------------|---------------


clEnqueueNDRangeKernel() (GBPS)        | 1.76






4. Host mapped read of outputBuffer


---------------------------------------|---------------


clEnqueueMapBuffer -- READ (GBPS)      | 1.28e+04


---------------------------------------|---------------


CPU read (GBPS)                        | 4.04


---------------------------------------|---------------


clEnqueueUnmapMemObject() (GBPS)       | 6.69e+03




Verification Passed!






Passed!


./BufferBandwidth -if 0 -if 5 -of 1 -of 5 -db -nb 131135488


Platform found : Advanced Micro Devices, Inc.






Device  0            Capeverde


Build:               release


GPU work items:      745088


Buffer size:         131135488


CPU workers:         1


Timing loops:        20


Repeats:             1


Kernel loops:        20


inputBuffer:         CL_MEM_READ_ONLY CL_MEM_ALLOC_HOST_PTR


outputBuffer:        CL_MEM_WRITE_ONLY CL_MEM_ALLOC_HOST_PTR






AVERAGES (over loops 2 - 19, use -l for complete log)


--------






1. Host mapped write to inputBuffer


---------------------------------------|---------------


clEnqueueMapBuffer -- WRITE (GBPS)     | 1.39e+04


---------------------------------------|---------------


memset() (GBPS)                        | 4.3


---------------------------------------|---------------


clEnqueueUnmapMemObject() (GBPS)       | 6.67e+03






2. GPU kernel read of inputBuffer


---------------------------------------|---------------


clEnqueueNDRangeKernel() (GBPS)        | 3.03




Verification Passed!






3. GPU kernel write to outputBuffer


---------------------------------------|---------------


clEnqueueNDRangeKernel() (GBPS)        | 32.1






4. Host mapped read of outputBuffer


---------------------------------------|---------------


clEnqueueMapBuffer -- READ (GBPS)      | 2.32


---------------------------------------|---------------


CPU read (GBPS)                        | 4.03


---------------------------------------|---------------


clEnqueueUnmapMemObject() (GBPS)       | 5.47e+03




Verification Passed!






Passed!



0 Kudos
Reply