Archives Discussions

cni_zhd · ‎07-05-2015

Hi,

I create three commandQueues, one to write buffer, the other one to read buffer, the last one to execute kernel. There are a set of kernels, suquence execution. There are three "stages" in the program, the first provide inputs to correspond with writing buffer, the second execute kernels, the last read results of the execution of kernels. Reading buffer, writing buffer, and executing kernel are parallel, but, when reading buffer or writing buffer, the execution of the kernels are't continuous. Between the first and the last stage, this is where gap usually occurs. By the CodeXL, the gap between them is about 5ms. Regardless of the correct results, discarding read/write buffer, the execution of the kernels are not gaps. I have looked at the optimization guide in AMD's website not to find any reasons. Is there any modes to reduce the effects of clEnqueueReadBuffer/clEnqueueWriteBuffer, and not to reduce performance?

About environment, I am using the FirePro W9100 in Win7 64 environment. The amd CCC version about FirePro W9100 is 2015.0113.1141.20974.

Thanks.

jtrudeau · ‎07-06-2015

Welcome! I have whitelisted you and moved this into the OpenCL Forum.

cni_zhd · ‎07-06-2015

Thank you.

dipak · ‎07-07-2015

Hi,

Could you please be little more explicit? Any reference code and/or detail description about the code-flow would be helpful. BTW, did you try map/unmap instead of read/write buffer? Any change in observation?

FYI: there is a SDK sample called "TransferOverlap" which shows how to overlap the buffer transfer with running a kernel. You may check it once.

Regards,

cni_zhd · ‎07-13-2015

Thank you for your reply.

I'm sorry about my description. I take advantage of "pipeline", very classic processing model. There are three stages, one is input stage, the other is processing stage, the last is output stage.

I try map/unmap instead of read/write buffer, but performance is poor compare with read/write buffer. Read/write buffer and executable kernel could not sufficiently been paralleled.

Where Could I modify or consider? Thank you.

dipak · ‎07-13-2015

Could you please share a simple test-case code that manifest this problem?

Regards,

Archives Discussions

How to reduce the effects of clEnqueueReadBuffer/clEnqueueWriteBuffer on the execution of the kernel?