cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Jetto
Journeyman III

bug in haar_wavelet (cpp and legacy)

Haar wavelet sample is broken

I think I found an interesting exercise in sample directory.

Can somebody tell me if

/usr/local/amdbrook/samples/bin/CPP/lnx_x86_64/haar_wavelet -i 2 -e -y 128  -x 128 -p

gives

-e Verify correct output.
Computing Haar Wavelet Transform on CPU ... Done
./haar_wavelet: Failed!

-p Compare performance with CPU.
   Width  Height      Iterations  CPU Total Time  GPU Total Time         Speedup
     128     128               2               0           0.057               0

but success with -x 128 -y 127

0 Likes
8 Replies
rahulgarg
Adept II

I tested it on Vista-64 and got the same output as you (failed for -y 128 and passed for -y 127)
0 Likes
Jetto
Journeyman III

Before fixing, I have try to look on an obvious improvement doing stream init and result copy out of the iteration loop.

Surprise that fix also .

I don't understand at all why but that fix.

I also got some perfomance improvement.

diff -u /usr/local/amdbrook/samples/legacy/apps/haar_wavelet/haar_wavelet.br haar_wavelet.br
--- /usr/local/amdbrook/samples/legacy/apps/haar_wavelet/haar_wavelet.br    2008-12-03 01:12:53.000000000 +0100
+++ haar_wavelet.br    2009-01-10 17:13:18.000000000 +0100
@@ -171,10 +171,10 @@
 
         // Record GPU Total time
         Start(0);
+        // Write to stream
+        streamRead(stream0, io[0]);
         for (i = 0; i < cmd.Iterations; ++i)
         {
-            // Write to stream
-            streamRead(stream0, io[0]);
    
             // Run the brook program
             while (w > 1)
@@ -199,16 +199,16 @@
                 inp = 1 - inp;
    
             }
+        }
 
-            // Write data back from stream
-            if(!inp)
-            {
-                streamWrite(stream0, io[1]);
-            }
-            else
-            {
-                streamWrite(stream1, io[1]);
-            }
+        // Write data back from stream
+        if(!inp)
+        {
+            streamWrite(stream0, io[1]);
+        }
+        else
+        {
+            streamWrite(stream1, io[1]);
         }
         Stop(0);
     }

0 Likes

My patch is full buggy. Really I don't understand how it can give the right result. With this fix insted of doing the wavelet transform i time on the same data, it does the new iteration with the result of the last iteration...

I'm very confuse with this.

When the test fail the gpu output equal the input.

 

0 Likes

Try setting environment variable BRT_RUNTIME=cpu and see if it works.

0 Likes
Ceq
Journeyman III

Maybe I'm missing something but I think is just adding two lines to reinitialize variables:
Add in new CPP code line 272 or old legacy code line 215 (w = Length; instead).

...
for (i = 0; i < info->Iterations; ++i )
{
// Write to stream
inp = 0; // <------
w = _width * _height; // <------
stream0.read(_input );
...
0 Likes
Jetto
Journeyman III

Cep, you are right. Thank you

I had thinked to imp variable but not to w.

 

0 Likes

I afraid that using gpu for haar wavelet is useless because perf aren't very good :

Width   Height  Iterations      CPU Total Time  GPU Total Time  Speedup        
4096    4096    100             44.084000       69.486000       0.634430

That's annoying because I would like to do Dirac video encoding

0 Likes

haar wavelet uses domain in a loop multiple times. Domain operator has bad performance and it is suggested to avoid use of this operator.

You can try emulating domain by passing different constant parameters(specify domain using these constants) to kernel and specifying domain of execution of the kernel.

e.g. rather calling a kernel like this-

copy(avgStream.domain(domainStart1, domainEnd1) , stream1.domain(domainStart1, domainEnd1));

It will be a good idea to call it something like this-

copy.domainOffset(uint4(*domainStart1, 0, 0, 0));
copy.domainSize(uint4(*domainEnd1 - *domainStart1, 1, 1, 1));
copy(avgStream, stream1);

Similary a call to haar_wavelet kernel can be changed. Keep in mind that calculation of idx1 and idx2 inside kernel will change as now instance() value will vary from *domainStart1 to *domainend1 (not from 0 to stream width).

0 Likes