Archives Discussions

omer · ‎08-16-2015

Hi All,

I wrote an OpenCL application which enqueues a series of non-blocking buffer writes, kernel executions, and buffer reads in a loop. At the end of the loop, I call clFinish. When profiling my program in CodeXL, the program spends much of the time in the clFinish call (which is good), but almost all that time is spent with my GPU (a Fury X) idle. I've attached a screenshot of a sample run. If I add up all the time spent on transfers and kernels, around 2 seconds out of 10 is spent doing real work. I have also tried adding more queues. Now the theoretical performance should be about 1 second as computation and transfer are interleaved, but I get runtimes ranging from 3 to 10 seconds (examples also attached). Any ideas about what is going on here? I've seen a couple threads from ~3 years ago pointing to profiling bugs, but hopefully these have been fixed.

Thanks!

Omer

omer · ‎08-18-2015

I think I tracked down the issue. I was actually trying to allocate buffers in the loop, but using the c++ interface which automatically deallocates them when the buffer object goes out of scope. I was therefore using pointers to deallocated buffers. I'm sure that can cause all kinds of bad behavior, including what I'm was seeing here.

View solution in original post

pinform · ‎08-17-2015

Hi Omer,

Welcome!

I've white-listed you, so you should be able to post in the appropriate forum. Happy posting.

--Prasad

omer · ‎08-18-2015

I think I tracked down the issue. I was actually trying to allocate buffers in the loop, but using the c++ interface which automatically deallocates them when the buffer object goes out of scope. I was therefore using pointers to deallocated buffers. I'm sure that can cause all kinds of bad behavior, including what I'm was seeing here.

Archives Discussions

GPU spends vast majority of time idle during OpenCL execution