10 Replies Latest reply on Oct 30, 2015 9:04 AM by nibal

    LDS write problem

    nibal

      When I write my program's float result to global memory, it comes out fine. But when I write its output to LDS I get all nans and infinities. What gives?

      I use printf from the kernel to get the values. Kernel is just 1 group of 64 workers. Using sdk 3.0 in Ubuntu 14.04.

        • Re: LDS write problem
          dipak

          Could you please provide a sample code that manifests the above problem?

           

          Regards,

            • Re: LDS write problem
              nibal

              Ty for your fast reply.

              I am attaching 2 kernel versions: 1 with global memory, and 2 with local. I hope you can use your 1.x/FFT sample for testing. Local kernel needs at least 8192 floats (real + imag) input, global needs at least 1024.

              I hope you will get at least different results for local and global cases. If you cannot reproduce problem, I can give you a more reproducible case, with real input, but it will be much more difficult.

               

              Using Ubuntu 14.04 x64 with R9 270 GPU, and latest catalyst (15.9)

                • Re: LDS write problem
                  dipak

                  Thanks for sharing the kernel files.

                   

                  I am attaching 2 kernel versions: 1 with global memory, and 2 with local. I hope you can use your 1.x/FFT sample for testing.

                   

                  Do you mean that I can use the FFT sample (host-code) to run those kernels? Otherwise, it would be helpful if you can share the host-side code as well, if you already have one.

                   

                  Regards,

                    • Re: LDS write problem
                      nibal

                      Hmmm. You are probably right. Kernels are a modification of your FFT code, but their args have evolved a bit since then. I will have to cook up a sample for it. Will even provide for reproducible input data. Give me a couple of hours, though. :-(

                        • Re: LDS write problem
                          nibal

                          nibal wrote:

                           

                          Hmmm. You are probably right. Kernels are a modification of your FFT code, but their args have evolved a bit since then. I will have to cook up a sample for it. Will even provide for reproducible input data. Give me a couple of hours, though. :-(

                           

                          That turned out to be a whole new project. It took me the whole day to develop and test it :-( Hope its worth the effort.

                          Expand the attachment, and it will create an lds directory where you can run all tests. Use the enclosed 2 kernels instead of the 2 ones i sent you earlier. Not much changed, just the order of kernel args.

                          A Makefile is included and first you need to make the executables:

                          make clean

                          make db

                          make

                          Be careful to make clean after each run witth the same kernel. My code generates a binary image each time and uses preferentialy that one if it exists. In SDK 3.0 if you use a binary image with a printf statement, you will crash (another ticket).

                          You run tests by typing:

                          dsp -h                              Online help.

                          dsp -k fft1.cl new.bin

                          dsp -k fft2.cl new.bin

                           

                          You should notice the pwr values and their avg value written out by each kernel. Feel free to freeze or typescript or interrupt at any time. You should see different values between fft1.cl and fft2.cl, with most of the fft2.cl values being nans. LDS is 32 bit wide according to the optimization guide, so it's ideal for floats.

                          On Ubuntu 14.04 x64 with Radeon R9 270 and latest catalyst 15.9

                            • Re: LDS write problem
                              dipak

                              Thanks...I'll check and get back to you soon.

                               

                              In SDK 3.0 if you use a binary image with a printf statement, you will crash (another ticket).

                              It is already a known issue. For more details, please refer this thread: executeNDRangeKernel crashes with segfault under certain circumstances

                               

                              Regards,

                                • Re: LDS write problem
                                  nibal




                                  It is already a known issue. For more details, please refer this thread: executeNDRangeKernel crashes with segfault under certain circumstances

                                   

                                  I know. Just wanted to stress that make clean deletes all binary images *_Pitcairn. If you use another video card, you should edit that in the Makefile.

                                    • Re: LDS write problem
                                      dipak

                                      I'm able to reproduce the nan values using a Hawaii card. Comparing the two cl files, I can see many differences. I haven't yet checked the code in detail. I guess, it would be helpful if you could point out some relevant sections of the code that may create the differences.

                                       

                                      BTW, during the building, I removed the "-lasound" option as it was giving error. I guess, the library is not a compulsory one and not relevant to this issue.

                                       

                                      Regards,

                                        • Re: LDS write problem
                                          nibal

                                          dipak wrote:

                                           

                                          I'm able to reproduce the nan values using a Hawaii card. Comparing the two cl files, I can see many differences. I haven't yet checked the code in detail. I guess, it would be helpful if you could point out some relevant sections of the code that may create the differences.

                                           

                                          BTW, during the building, I removed the "-lasound" option as it was giving error. I guess, the library is not a compulsory one and not relevant to this issue.

                                           

                                          Regards,

                                          Nice! At least there is no problem with my card

                                          Indeed, libasound is part of the original project, not needed for the sample.

                                          95% of the code is your FFT sample's code. The rest 5% deals with averaging these values over MAXPASS (8) runs, and on the last pass, findSignal gets the local maxima from a rolling average.

                                           

                                          Hold it. Looks like it is not an LDS writing problem, but a problem with the rest of the code. Substituting zr0... instead of avg in the printf statements, I get the same nan values in fft2. Will check my input at various stages and update ticket accordingly.

                                            • Re: LDS write problem
                                              nibal

                                              My code had problems with a couple of offsets. Now it runs fine.

                                              Sorry i couldn't test it before, but I had the build problem with -g and printf, and CodeXL doesn't show local variables.

                                              BTW the lds avg implementation runs a bit slower than the global one. Since LDS is a precious resource, I'm dumping it.

                                              Ty for your time.