cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

lava555
Journeyman III

corresponding method of clFinish() in kernel file

I'm now using two kernel functions to solve a problem, and two methods of invoking them have been used. In the first method, a kernel function is called by the other kernel function; and the two kernel functions are invoked on the host in-order in the second method. 

I used mem_fence() method in the first method and clFinish() in the second method. However, when i debug the program using the first method, something seemed wrong. So my question is: Is there something wrong of the first method, or the method in the kernel corresponding to clFinish() is not mem_fence() function? I also used barrier() function in both of the method.

0 Likes
8 Replies
dominik_g
Journeyman III

The first method won't work. You would need a global synchronization in OpenCL, which doesn't exist. You can only synchronize workitems within a workgroup (with barriers).

Global synchronization can only be achieved by having multiple kernels and invocing them in sequence on the host side (your second method). I think you could also implement your own synchronization using atomic reads and writes to global memory, but this is non-trivial...

0 Likes

Thank u dominik_g!  Now I undersand that

I just want to save time wasted on transformation of data between host and device in the first method. If the tow kernel functions are invoked in-sequence on the host, two times of data transformation is needed. if the scale of data is large, the time factor could not be avoided. Now i'm thinking that how to hide the time delayisgust;

0 Likes

What do you mean by "data transformation"? The data stays on the GPU between the two kernel executions, right? So the only overhead comes from having two kernel calls instead of one.

0 Likes

thanks dominik_g!

I don't think so. two times of kernel functions need two data transmissions. (I'm sorry for typing data transformation by miskate. it's data transmission)

"The data stays on the GPU between the two kernel executions"-- I declared data as __global type. but how you can make data stays on the GPU between the two kernel executions?

0 Likes

lava555,
That is a decision the runtime/drivers decide based on device resources, what else is running and the current execution environment.
0 Likes

thanks MicahVillmow~

 

0 Likes

And how is to check if memory was the same?

0 Likes
Lev
Journeyman III

Really.

0 Likes