cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

omer
Journeyman III

GPU spends vast majority of time idle during OpenCL execution

Hi All,

I wrote an OpenCL application which enqueues a series of non-blocking buffer writes, kernel executions, and buffer reads in a loop. At the end of the loop, I call clFinish. When profiling my program in CodeXL, the program spends much of the time in the clFinish call (which is good), but almost all that time is spent with my GPU (a Fury X) idle. I've attached a screenshot of a sample run. If I add up all the time spent on transfers and kernels, around 2 seconds out of 10 is spent doing real work. I have also tried adding more queues. Now the theoretical performance should be about 1 second as computation and transfer are interleaved, but I get runtimes ranging from 3 to 10 seconds (examples also attached). Any ideas about what is going on here? I've seen a couple threads from ~3 years ago pointing to profiling bugs, but hopefully these have been fixed.

Thanks!

Omer

0 Likes
1 Solution
omer
Journeyman III

I think I tracked down the issue. I was actually trying to allocate buffers in the loop, but using the c++ interface which automatically deallocates them when the buffer object goes out of scope. I was therefore using pointers to deallocated buffers. I'm sure that can cause all kinds of bad behavior, including what I'm was seeing here.

View solution in original post

0 Likes
2 Replies
pinform
Staff

Hi Omer,

Welcome!

I've white-listed you, so you should be able to post in the appropriate forum.  Happy posting.

--Prasad

0 Likes
omer
Journeyman III

I think I tracked down the issue. I was actually trying to allocate buffers in the loop, but using the c++ interface which automatically deallocates them when the buffer object goes out of scope. I was therefore using pointers to deallocated buffers. I'm sure that can cause all kinds of bad behavior, including what I'm was seeing here.

0 Likes