cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

laoshi
Journeyman III

gradual performance degradation of clEnqueueReadImage()

Hello,

I'm doing a Ray Cast of some fairly large volume (examples from http://www.volvis.org) for medical applications.

I'm generating a "DRR", that is, a digital X-Ray of the objects by integrating along a ray.

I have a 1024 by 1024 by 16 bit intensity destination image.

I am rotating about the volume from 0 - 90 or 180 degrees.

I notice, both in my own timings and in the AMD profiler that the read of the 2D projected image is slowing down over time:

For instance

elapsed time = 363.08

elapsed time = 359.41

elapsed time = 353.417

elapsed time = 359

elapsed time = 365.704

elapsed time = 364.322

elapsed time = 364.974

elapsed time = 364.162

elapsed time = 360.681

elapsed time = 358.457

elapsed time = 366.828

elapsed time = 366.772

elapsed time = 369.182

elapsed time = 365.62

elapsed time = 371.733

elapsed time = 373.001

elapsed time = 373.148

elapsed time = 373.764

elapsed time = 376.527

elapsed time = 378.569

elapsed time = 379.176

elapsed time = 380.202

elapsed time = 383.006

elapsed time = 384.96

elapsed time = 385.866

elapsed time = 387.969

elapsed time = 386.392

elapsed time = 386.947

elapsed time = 395.193

elapsed time = 398.699

elapsed time = 400.417

elapsed time = 403.66

elapsed time = 406.639

elapsed time = 410.729

elapsed time = 414.783

elapsed time = 417.085

elapsed time = 420.945

elapsed time = 424.348

elapsed time = 430.082

elapsed time = 434.506

elapsed time = 438.68

elapsed time = 445.329

elapsed time = 441.212

elapsed time = 446.529

And this get's much slower as time goes on.

I don't see any "hints" from the profiler as to what may be the problem. Has anyone seen this type of problem, and what the cause might be?

Thanks,

Rick

0 Likes
2 Replies
drallan
Challenger

Hi Rick,

I often see a similar effect when projecting a 3D volume onto a 2D image where I  sum through the volume to get each 2D pixel.

The time variation (at least mine) occurs while rotating through different angles. As you change angles, the step between sequential 3D volume memory addresses slowly changes from small to large (or v.v.). This can affect both the GPU's memory cache and/or the access pattern of the GPU's memory controllers thus varying the time it takes to access memory.

There is no way around this but you may be able to optimize a bit. If you  consider a 3D volume of dimensions DX, DY, and DZ, memory stepping can go from 1 to DX along one rotation access, or maybe DX to DX*DY along another rotation axis etc. Depending on the problem, you might be able to choose a better set of angles.

Allan

0 Likes

Thanks, that make sense...., I guess the read call has to wait for my kernel to return before it gets access to the bits? So if my memory access times are a function of the angle, I should be able to see it come back up to speed every 180 degrees around the volume?

0 Likes