omion

Trying to get asynchronous transfer from SDK 2.3

Discussion created by omion on Jan 2, 2011
Latest reply on Jan 6, 2011 by oscarbarenys1
I heard that SDK 2.3 supports DMA transfers, which I gather will allow data to be transferred to/from the GPU while it is working on a kernel. (right?)

I am now trying to see if it is working, and nothing that I do will show it.

[edit: I have an HD 5850, running with Catalyst 10.12 and of course Stream SDK 2.3]

My basic idea is to have 2 buffers (buf1,buf2) with a kernel for each (kern1,kern2) and do a loop with something like this:
...
EnqueueWrite(buf1)
EnqueueKernel(kern2)
EnqueueWrite(buf2)
EnqueueKernel(kern1)
...

this way buf1 can be transferred while kern2 is running and buf2 can be transferred while kern1 is running. However, if I do the writes synchronously and wait for each kernel to finish right after it starts, it takes almost the exact same amount of time as async transfer and no waiting.

The stream timeline in the Stream Profiler shows the GPU switching between transfer and execution, with no overlap.

The exact program is a bit hairy, but I attached the actual execution part of it. (In the real code, I check the return for everything. LEN is the length of each buffer) Am I doing anything wrong?

clEnqueueWriteBuffer(q, buf2, CL_FALSE, 0, LEN * sizeof(cl_float), ptr2, 0, NULL, &e2write); clEnqueueWriteBuffer(q, buf1, CL_FALSE, 0, LEN * sizeof(cl_float), ptr1, 0, NULL, &e1write); clEnqueueNDRangeKernel(q, kern2, sizeof(global_dims) / sizeof(size_t), NULL, global_dims, local_dims, 1, &e2write, &e2kern); for(i = 0; i < 32; i++) { clEnqueueWriteBuffer(q, buf2, CL_FALSE, 0, LEN * sizeof(cl_float), ptr2, 1, &e2kern, &e2write); clEnqueueNDRangeKernel(q, kern1, sizeof(global_dims) / sizeof(size_t), NULL, global_dims, local_dims, 1, &e1write, &e1kern); clEnqueueWriteBuffer(q, buf1, CL_FALSE, 0, LEN * sizeof(cl_float), ptr1, 1, &e1kern, &e1write); clEnqueueNDRangeKernel(q, kern2, sizeof(global_dims) / sizeof(size_t), NULL, global_dims, local_dims, 1, &e2write, &e2kern); } clWaitForEvents(1, &e1kern); clWaitForEvents(1, &e2kern);

Outcomes