8 Replies Latest reply on Jan 10, 2009 4:41 PM by gaurav.garg

    bug in haar_wavelet (cpp and legacy)

    Jetto
      Haar wavelet sample is broken

      I think I found an interesting exercise in sample directory.

      Can somebody tell me if

      /usr/local/amdbrook/samples/bin/CPP/lnx_x86_64/haar_wavelet -i 2 -e -y 128  -x 128 -p

      gives

      -e Verify correct output.
      Computing Haar Wavelet Transform on CPU ... Done
      ./haar_wavelet: Failed!

      -p Compare performance with CPU.
         Width  Height      Iterations  CPU Total Time  GPU Total Time         Speedup
           128     128               2               0           0.057               0

      but success with -x 128 -y 127

        • bug in haar_wavelet (cpp and legacy)
          rahulgarg
          I tested it on Vista-64 and got the same output as you (failed for -y 128 and passed for -y 127)
          • looks like I have a fix on legacy sample
            Jetto

            Before fixing, I have try to look on an obvious improvement doing stream init and result copy out of the iteration loop.

            Surprise that fix also .

            I don't understand at all why but that fix.

            I also got some perfomance improvement.

            diff -u /usr/local/amdbrook/samples/legacy/apps/haar_wavelet/haar_wavelet.br haar_wavelet.br
            --- /usr/local/amdbrook/samples/legacy/apps/haar_wavelet/haar_wavelet.br    2008-12-03 01:12:53.000000000 +0100
            +++ haar_wavelet.br    2009-01-10 17:13:18.000000000 +0100
            @@ -171,10 +171,10 @@
             
                     // Record GPU Total time
                     Start(0);
            +        // Write to stream
            +        streamRead(stream0, io[0]);
                     for (i = 0; i < cmd.Iterations; ++i)
                     {
            -            // Write to stream
            -            streamRead(stream0, io[0]);
                
                         // Run the brook program
                         while (w > 1)
            @@ -199,16 +199,16 @@
                             inp = 1 - inp;
                
                         }
            +        }
             
            -            // Write data back from stream
            -            if(!inp)
            -            {
            -                streamWrite(stream0, io[1]);
            -            }
            -            else
            -            {
            -                streamWrite(stream1, io[1]);
            -            }
            +        // Write data back from stream
            +        if(!inp)
            +        {
            +            streamWrite(stream0, io[1]);
            +        }
            +        else
            +        {
            +            streamWrite(stream1, io[1]);
                     }
                     Stop(0);
                 }

            • bug in haar_wavelet (cpp and legacy)
              Ceq
              Maybe I'm missing something but I think is just adding two lines to reinitialize variables:
              Add in new CPP code line 272 or old legacy code line 215 (w = Length; instead).

              ...
              for (i = 0; i < info->Iterations; ++i )
              {
              // Write to stream
              inp = 0; // <------
              w = _width * _height; // <------
              stream0.read(_input );
              ...
                • bug in haar_wavelet (cpp and legacy)
                  Jetto

                  Cep, you are right. Thank you

                  I had thinked to imp variable but not to w.

                   

                    • bug in haar_wavelet (cpp and legacy)
                      Jetto

                      I afraid that using gpu for haar wavelet is useless because perf aren't very good :

                      Width   Height  Iterations      CPU Total Time  GPU Total Time  Speedup        
                      4096    4096    100             44.084000       69.486000       0.634430

                      That's annoying because I would like to do Dirac video encoding

                        • bug in haar_wavelet (cpp and legacy)
                          gaurav.garg

                          haar wavelet uses domain in a loop multiple times. Domain operator has bad performance and it is suggested to avoid use of this operator.

                          You can try emulating domain by passing different constant parameters(specify domain using these constants) to kernel and specifying domain of execution of the kernel.

                          e.g. rather calling a kernel like this-

                          copy(avgStream.domain(domainStart1, domainEnd1) , stream1.domain(domainStart1, domainEnd1));

                          It will be a good idea to call it something like this-

                          copy.domainOffset(uint4(*domainStart1, 0, 0, 0));
                          copy.domainSize(uint4(*domainEnd1 - *domainStart1, 1, 1, 1));
                          copy(avgStream, stream1);

                          Similary a call to haar_wavelet kernel can be changed. Keep in mind that calculation of idx1 and idx2 inside kernel will change as now instance() value will vary from *domainStart1 to *domainend1 (not from 0 to stream width).