how to explain different execution paths in the same warp ?

Discussion created by Fuxianjun on Aug 10, 2010
Latest reply on Aug 11, 2010 by Fuxianjun

following is quoted from http://www.cmsoft.com.br/index.php?option=com_content&view=category&layout=blog&id=92&Itemid=144

Does anyone tell me the exact reason.

another problem is :

i need to conbine two algorithm into one kernel. one's optimal workitem number is 100, but another's is 2, how to get the optimal workitem number of the combined kernel ?

It may be easier to use an example. The worst thing that can happen in a kernel, concerning execution paths, is: kernel void myKernel() { if (condition) { do work } } As you can see, some kernels will be launched and do nothing at all. This is not good. As a rule of thumb, have in mind that throwing a worker is an expensive task and you want each worker to effectively work. Remember that vector sum kernel almost every OpenCL tutorial posts as an example? It is not very effective because each worker only executes one sum. Another thing to avoid is: kernel void myKernel() { if (condition) { do work } else { do something completely different } } You would prefer something like: kernel void myKernel() { if (condition) { do work } else { do something with the exact same operations and order with different data } } I don't work for AMD or NVidia in order to know implementation details and explain exactly why this is bad. What I do know is that it messes with the parallel operations that the hardware can handle.