Archives Discussions

cvazquezb · ‎11-28-2012

Tested with a Radeon HD 7970 on Windows 7 64 bits. Driver updated to Catalyst 12.10

The relevant code:

#define PIXELS 20

for(int dy=0;dy<PIXELS;dy++)

{

(...)

if(dy>0)

{

w=weights[(PIXELS-1-dy)*VOX_SLICE*workspace->xMaxVoxels];

c=evaluate(lines[-dy+PIXELS-1].z,lines[-dy+PIXELS-1].w,slice)+yChunk;

for(int y=dy;y<PIXELS;y++)

{

if((c>=0)&&(c<CHUNK_SIZE))

chunk[c *VOX_ROW]+=w;

barrier(CLK_LOCAL_MEM_FENCE);

c+=Z_SLOPE;

}

If I change the seventh line for the equivalent:

w=weights[(-dy+PIXELS-1)*VOX_SLICE*workspace->xMaxVoxels];

The generated IL changes a lot from register renaming with the first version having a few more instructions. Worse still, after the change results are completely wrong. The code works fine either way on a variety of Nvidia platforms. I'm afraid I can't provide the full code without an NDA but I'd be happy to help in any other way.

binying · ‎11-28-2012

How about with Catalyst 12.11 beta?

View solution in original post

binying · ‎11-28-2012

How about with Catalyst 12.11 beta?

yurtesen · ‎11-28-2012

Did you try printf to see if the values are what you expect? I would think it would be unlikely for the compiler to make a mistake in (PIXELS-1-dy) vs (-dy+PIXELS-1). It probably replaces PIXELS-1 with 19... so it would end up with 19-dy or -dy+19 ... I am not sure if AMD can replicate the problem...?

Did you try on different cards? also on CPU device to see if you are getting same results? Perhaps you can put your kernel to KernelAnalyzer[1,2] to see what they think....

With so little information, there is not much which can be said....

cvazquezb · ‎11-29-2012

I'm sorry about lack of details, I'm very limited (in a legal sense) about what kind of information I can provide publicly. That aside, I indeed tried to printf the results. And something very strange happened: printed results were fine, but the kernel slowed down to a crawl, even after I limited the printout to a couple lines. What normally takes <10 seconds was still halfway after three hours!

Anyway, I did what binying proposed and went with 12.11 beta. It worked like a charm, no modification to the code necessary. So it looks like a code generation bug indeed. I'm glad they sorted it out already, but it certainly doesn't make me feel very confident about the robustness of OpenCL on AMD platforms. If I hadn't had other platforms to test the code on I would have spent a lot of time trying to fix a non-existing problem on my side.

I thank you both for taking your time to answer.

Archives Discussions

Possible compiler bug with Catalyst 12.10