11 Replies Latest reply on Jun 17, 2014 12:21 PM by lontotuong

    Bug in CPU implementation

    bubu

      With ratGPU this is the render I get using the AMD CPU implementation:

       

       

      ao1.jpg

       

      and this is the correct one using the GPU and also from Intel CPU implementation.

       

      ao2.jpg

       

      So it seems your compiler is messing a bit some ray signs, probably due to some kind of optimization you applied in the drivers?

        • Re: Bug in CPU implementation
          himanshu.gautam

          I have no idea.. But if you are using OpenCL, Can you try "-cl-opt-disable" and see if some optimizations are messing it up?

            • Re: Bug in CPU implementation
              bubu

              Apparently, what mess the result is the -cl-fast-relaxed-math flag.

              If I use -cl-opt-disable or empty options string then it works ok.

               

              I think it's a bug in your OpenCL CPU implementation because:

               

              1. It works ok with the Radeon GPU implementation.

              2. It works ok with other implementation like Intel or NVIDIA.

               

              Btw, I'm using Windows 7 x64 and Catalyst 13.9/13.11 beta, a Radeon 7790 and ratGPU 0.6.0.

                • Re: Bug in CPU implementation
                  himanshu.gautam

                  What you do inside your runtime for "cl-fast-relaxed-math" can be vendor dependent....

                  Anyway, I will go check...

                   

                  Thanks for testing and posting the result here,

                  Best,

                  Bruhaspati

                    • Re: Bug in CPU implementation
                      himanshu.gautam

                      I hear that "cl-fast-relaxed-math" must be more accurate on CPU than GPU.

                      Can you confirm that what you are seeing is just the opposite?

                      +

                      Any repro-case would be helpful.

                      I visited the ratGPU page.

                      My proxy failed the download....But I guess you are supplying only binaries..

                      Any chance you can give us a small petite repro-case?

                       

                      +

                      What is the speed-gain you achieve using cl-fast-relaxed-math option on all platforms?

                      Thats probably an indication of the degree of loss of accuracy..

                       

                      Best,

                      Bruhaspati

                        • Re: Bug in CPU implementation
                          bubu

                          Debugging it I see clearly the sign of some operations and inverted. Things that should be X are -X or 0.0f, causing the bug.

                          The fast relaxed math improves performance around 10%, because I'm making lots of MAD operations as well 1/sqrt and 1/x operations.

                          Repo case: download ratGPU 0.6.0 for Windows and test yourself with Catalst 13.9/11beta with only the AMD CPU OpenCL device enabled.

                            • Re: Bug in CPU implementation
                              himanshu.gautam

                              ratGPU is available as deb package or EXE installer.
                              Does it install the sources as well?

                               

                              +

                              I need a compact test-case. If you think the signs are inverted, Can you create a small test-case that shows the problem...............?

                               

                              I am planning to develop a repro-case template with the necessary host-code to launch a kernel and get output.

                              May be, you can then just plug-in your kernel fragment, replace some host stubs and can provide us a repro-case.

                              Do you think that would help?

                               

                              For now,
                              I have just attached a test-repro-case I wrote for finding some problem in some other thread...

                              May be, this can give you a headstart in writing your own simple repro-case....

                               

                              Thanks,

                              Best,

                              Bruhaspati

                                • Re: Bug in CPU implementation
                                  bubu

                                  Thanks for the small code template. I'll try to locate which part is causing the problem but gonna take me some time because the kernel is quite complex

                                    • Re: Bug in CPU implementation
                                      khan24

                                      I had no idea, but now I have learned enough from this forum. Specially thanks to devgurus.amd.com.

                                        • Re: Bug in CPU implementation
                                          pinform

                                          Hi bubu

                                           

                                          I'm reviving this thread.  Do you still see this issue?

                                           

                                          --Prasad

                                            • Re: Bug in CPU implementation
                                              bubu

                                              Yep, Catalyst 14.4 still exposes the problem ... amplified 

                                               

                                               

                                              The render differs from the GPU implementation and also from the C++/SIMD implementation. It only happens with the AMD's OpenCL CPU implemention, your GPU implemention seems to be fine.

                                               

                                              The "floor" problem is solved when "-cl-fast-relaxed-math" is not used though ( I bet it's a precision or optimization problem ) ... but the wall's z-fighting occurs even with the relaxed math disabled... and it completely differs from the GPU implementation anyways. Also, both Intel and NVIDIA implementions are working ok, so the problem seems to be related exclusively to your AMD CPU implementation ( maybe because I'm using an Intel CPU? )

                                               

                                              By the way, I'm using W8.1 x64 ( also W7 SP1 x64 ), Cat.14.4, Radeon 7770-1Gb, Intel i7-2700K Sandy Bridge.

                                      • Re: Bug in CPU implementation
                                        lontotuong

                                        i still have same problem so i try to resolve some time but can not ca cuoc online

                                        very bed I'm using W8.1 x64 ( also W7 SP1 x64 ), Cat.14.4, Radeon 7770-1Gb, Intel i7-2700K Sandy Bridge.