cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

cni_zhd
Journeyman III

How to reduce the effects of clEnqueueReadBuffer/clEnqueueWriteBuffer on the execution of the kernel?

Hi,

   I create three commandQueues, one to write buffer, the other one to read buffer, the last one to execute kernel. There are a set of kernels, suquence execution. There are three "stages" in the program, the first provide inputs to correspond with writing buffer, the second execute kernels, the last read results of the execution of kernels. Reading buffer, writing buffer, and executing kernel are parallel, but, when reading buffer or writing buffer, the execution of the kernels are't continuous. Between the first and the last stage, this is where gap usually occurs. By the CodeXL, the gap between them is about 5ms. Regardless of the correct results, discarding read/write buffer, the execution of the kernels are not gaps. I have looked at the optimization guide in AMD's website not to find any reasons. Is there any modes to reduce the effects of clEnqueueReadBuffer/clEnqueueWriteBuffer, and not to reduce performance?

   About environment, I am using the FirePro W9100 in Win7 64 environment. The amd CCC version about FirePro W9100 is 2015.0113.1141.20974.

Thanks.

0 Likes
5 Replies
jtrudeau
Staff

Welcome! I have whitelisted you and moved this into the OpenCL Forum.

0 Likes

Thank you.

0 Likes
dipak
Big Boss

Hi,

Could you please be little more explicit? Any reference code and/or detail description about the code-flow would be helpful. BTW, did you try map/unmap instead of read/write buffer? Any change in observation?

FYI: there is a SDK sample called "TransferOverlap" which shows how to overlap the buffer transfer with running a kernel. You may check it once.

Regards,

0 Likes

   Thank you for your reply.

   I'm sorry about my description. I take advantage of "pipeline", very classic processing model. There are three stages, one is input stage, the other is processing stage, the last is output stage.

   I try map/unmap instead of read/write buffer, but performance is poor compare with read/write buffer. Read/write buffer and executable kernel could not sufficiently been paralleled.

   Where Could I modify or consider? Thank you.

0 Likes

Could you please share a simple test-case code that manifest this problem?

Regards,

0 Likes