AnsweredAssumed Answered

How to reduce map/unmap overhead on APUs?

Question asked by Linuxhippy on Aug 5, 2015
Latest reply on Aug 9, 2015 by Linuxhippy

Hi,

 

I would like to make use of zero-copy in an APU environment for legacy code.

I intend to use the following code for data transfer:

 

// Create Buffers, somewhere else in the application

inBuf = clCreateBuffer(context, CL_MEM_READ_ONLY, bufSize, NULL, &err); //input

outBuf = clCreateBuffer(context, CL_MEM_WRITE_ONLY | CL_MEM_ALLOC_HOST_PTR, bufSize, NULL, &err); //output

 

// get direct pointer to buffer

inPtr = (unsigned char *) clEnqueueMapBuffer(commands, inBuf, CL_TRUE, CL_MAP_WRITE, 0,  bufSize, 0, NULL, NULL, &err);

// do something with the data pointed to by inPtr

clEnqueueUnmapMemObject(commands, inBuf, inPtr, 0, NULL, NULL); //unMap inPtr

 

clEnqueueNDRangeKernel(...)

 

// access result

outPtr = (unsigned char *) clEnqueueMapBuffer(commands, outBuf, CL_TRUE, CL_MAP_READ, 0,  bufSize, 0, NULL, NULL, &err);

clEnqueueUnmapMemObject(commands, outBuf, outPtr, 0, NULL, NULL); //unMap inPtr

 

 

Is this the correct way to perform data transfer?

 

Also for me low invocation / map overhead is more important than peak-throughput on the GPU: The OpenCL kernels will be executed as part of a legacy application, where there is no way to do double-buffered data transfers, so all the calls to map/unmap should be fast. Do the parameters chosen for buffer creation in the code above make sense to this scenario?

 

I've created a trace using CodeXL, and map/unmap with code very similar to the above snippit (only with 3 in/out buffers) has quite high overhead compared to the actual kernel invocation:

 

Bildschirmfoto vom 2015-08-05 17_28_43.png

 

As you can see, while the kernel executes in ~1.5ms (the first buffer-map is slow, because it has to wait for kernel execution).

However mapping the input buffers is horrible slow (CL_MAP_WRITE), taking 0.18-0.25ms each.

Isn't there anything I can do to reduce this overhead?

 

The APU I used is an AMD_A10-7800 (Spectre) running Centos-7 with the latest Catalyst drivers.

 

Thank you in advance, Clemens

Outcomes