cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

ri239
Journeyman III

clDXTCompression example not working, different errors with CPU and GPU

Just trying the AMD SDK on a Radeon 5870, and unfortunately I'm running into some really weird issues with an example from the NVIDIA SDK (clDXTCompression.)

First of all, I added #pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable to get it compiled properly with the AMD Stream SDK (I'm using the ATI Stream SDK v2.0 on Vista/x64.) I also had to follow the instructions at http://developer.amd.com/support/KnowledgeBase/Lists/KnowledgeBase/DispForm.aspx?ID=71 to obtain a proper context for running OpenCL.

Now the app successfully runs (i.e. produces no errors), but the output is completely broken -- even worse, the output is different when running the CPU and the GPU version. Any ideas where to start debugging? The GPU version fills the complete output buffer, but the ordering is totally off ... seems like random noise going on.

So, the questions are:

  • Is cl_khr_byte_addressable_store going to be ever supported on the HD5870? Right now, it requires you to enable the extension using #pragma, but does not fail afterwards even though the extension is not supported (?)
  • Where should I start for CPU debugging?

ff

0 Likes
3 Replies
ri239
Journeyman III

I just worked around the byte addressing issues, but the result is still the same (i.e. broken.) At least it's the same as before, so the byte addressing does not seem to be the culprit.

0 Likes

Originally posted by: ri239 I just worked around the byte addressing issues, but the result is still the same (i.e. broken.) At least it's the same as before, so the byte addressing does not seem to be the culprit.

 

Ri239,

        cl_khr_byte_addressable_store not supported on GPU presently. 

        Some Nvidia kernels written based on WARP size. In such cases, It is difficult to port to CPU.  Please understand kernel and port it to CPU.

0 Likes

Yeah, but even after working around the byte-addressable store issue (it was simply because they were writing to short*, which I changed to int*) it still doesn't work on the GPU. I merely tested on the CPU to see whether I can get the same result ...

I just looked through the source again. There are a few points where the code needs an additional barrier(), but it doesn't seem to fix the GPU version -- the CPU version changes yet again, resulting in a different (but still broken) image. Hmm, I'm really puzzled, cause the original OpenCL code doesn't look to advanced. Is there some debug mode available using the AMD Stream SDK? I would be already helpful to be able to run the code fully sequentially, to find out whether it's an ordering/synchronisation issue.

0 Likes