Archives Discussions

nibal · ‎10-28-2015

When I write my program's float result to global memory, it comes out fine. But when I write its output to LDS I get all nans and infinities. What gives?

I use printf from the kernel to get the values. Kernel is just 1 group of 64 workers. Using sdk 3.0 in Ubuntu 14.04.

nibal · ‎10-30-2015

My code had problems with a couple of offsets. Now it runs fine.

Sorry i couldn't test it before, but I had the build problem with -g and printf, and CodeXL doesn't show local variables.

BTW the lds avg implementation runs a bit slower than the global one. Since LDS is a precious resource, I'm dumping it.

Ty for your time.

View solution in original post

dipak · ‎10-29-2015

Could you please provide a sample code that manifests the above problem?

Regards,

nibal · ‎10-29-2015

Ty for your fast reply.

I am attaching 2 kernel versions: 1 with global memory, and 2 with local. I hope you can use your 1.x/FFT sample for testing. Local kernel needs at least 8192 floats (real + imag) input, global needs at least 1024.

I hope you will get at least different results for local and global cases. If you cannot reproduce problem, I can give you a more reproducible case, with real input, but it will be much more difficult.

Using Ubuntu 14.04 x64 with R9 270 GPU, and latest catalyst (15.9)

dipak · ‎10-29-2015

Thanks for sharing the kernel files.

I am attaching 2 kernel versions: 1 with global memory, and 2 with local. I hope you can use your 1.x/FFT sample for testing.

Do you mean that I can use the FFT sample (host-code) to run those kernels? Otherwise, it would be helpful if you can share the host-side code as well, if you already have one.

Regards,

nibal · ‎10-29-2015

Hmmm. You are probably right. Kernels are a modification of your FFT code, but their args have evolved a bit since then. I will have to cook up a sample for it. Will even provide for reproducible input data. Give me a couple of hours, though. 😞

nibal · ‎10-29-2015

nibal wrote:
Hmmm. You are probably right. Kernels are a modification of your FFT code, but their args have evolved a bit since then. I will have to cook up a sample for it. Will even provide for reproducible input data. Give me a couple of hours, though. 😞

That turned out to be a whole new project. It took me the whole day to develop and test it 😞 Hope its worth the effort.

Expand the attachment, and it will create an lds directory where you can run all tests. Use the enclosed 2 kernels instead of the 2 ones i sent you earlier. Not much changed, just the order of kernel args.

A Makefile is included and first you need to make the executables:

make clean

make db

make

Be careful to make clean after each run witth the same kernel. My code generates a binary image each time and uses preferentialy that one if it exists. In SDK 3.0 if you use a binary image with a printf statement, you will crash (another ticket).

You run tests by typing:

dsp -h Online help.

dsp -k fft1.cl new.bin

dsp -k fft2.cl new.bin

You should notice the pwr values and their avg value written out by each kernel. Feel free to freeze or typescript or interrupt at any time. You should see different values between fft1.cl and fft2.cl, with most of the fft2.cl values being nans. LDS is 32 bit wide according to the optimization guide, so it's ideal for floats.

On Ubuntu 14.04 x64 with Radeon R9 270 and latest catalyst 15.9

dipak · ‎10-30-2015

Thanks...I'll check and get back to you soon.

In SDK 3.0 if you use a binary image with a printf statement, you will crash (another ticket).

It is already a known issue. For more details, please refer this thread: executeNDRangeKernel crashes with segfault under certain circumstances

Regards,

nibal · ‎10-30-2015

It is already a known issue. For more details, please refer this thread: executeNDRangeKernel crashes with segfault under certain circumstances

I know. Just wanted to stress that make clean deletes all binary images *_Pitcairn. If you use another video card, you should edit that in the Makefile.

dipak · ‎10-30-2015

I'm able to reproduce the nan values using a Hawaii card. Comparing the two cl files, I can see many differences. I haven't yet checked the code in detail. I guess, it would be helpful if you could point out some relevant sections of the code that may create the differences.

BTW, during the building, I removed the "-lasound" option as it was giving error. I guess, the library is not a compulsory one and not relevant to this issue.

Regards,

nibal · ‎10-30-2015

dipak wrote:
I'm able to reproduce the nan values using a Hawaii card. Comparing the two cl files, I can see many differences. I haven't yet checked the code in detail. I guess, it would be helpful if you could point out some relevant sections of the code that may create the differences.
BTW, during the building, I removed the "-lasound" option as it was giving error. I guess, the library is not a compulsory one and not relevant to this issue.
Regards,

Nice! At least there is no problem with my card

Indeed, libasound is part of the original project, not needed for the sample.

95% of the code is your FFT sample's code. The rest 5% deals with averaging these values over MAXPASS (8) runs, and on the last pass, findSignal gets the local maxima from a rolling average.

Hold it. Looks like it is not an LDS writing problem, but a problem with the rest of the code. Substituting zr0... instead of avg in the printf statements, I get the same nan values in fft2. Will check my input at various stages and update ticket accordingly.

nibal · ‎10-30-2015

My code had problems with a couple of offsets. Now it runs fine.

Sorry i couldn't test it before, but I had the build problem with -g and printf, and CodeXL doesn't show local variables.

BTW the lds avg implementation runs a bit slower than the global one. Since LDS is a precious resource, I'm dumping it.

Ty for your time.

Archives Discussions

LDS write problem