Why the cost for write GPU resut into global mem and transfer back to CPU is so expensive?

Discussion created by zhuzxy on Aug 15, 2011
Latest reply on Aug 17, 2011 by zhuzxy


   I met an problem, when my final kernel finished, it tooks about 2 ms to execute in case I do not write result back to global mem. But if I did that ( ofcz I need that because I need the calculation result), the kernel time will be about 7.5 ms. The result is about 40 bytes for each work item, and total 640 work items. My question is why it is so expensive to write the result back to global mem? Di I have a better way to get the result back to CPU?