Same issue with the binary search example:
C:\Program Files\ATI\ATI Brook+ 1.4.0_beta\samples\bin\CPP\xp_x86_32>binary_search_d.exe -v -e -p -p -q -t -x 128 -y 128 -i 1
Verbose and Quiet cancel each other out.
Kernel Execution : Error with input streams
Width Height Iterations GPU Total Time
128 128 1 0.0251148
-e Verify correct output.
Performing Binary Searches on CPU ... Done
-p Compare performance with CPU.
Width Height Iterations CPU Total Time GPU Total Time Speedup
128 128 1 0.0981394 0.0251148 3.90763
C:\Program Files\ATI\ATI Brook+ 1.4.0_beta\samples\bin\CPP\xp_x86_32>
I am running this with:
Windows vista 32-bit
Stream Computing SDK 1.4 (beta)
The GPU I am using is: FireStream 9250
The errorLog on stream says "Kernel Execution : Error with input streams". That means there is some error with input streams of kernel. You should try to check error and errorLog on the input streams of kernel and see what they return.
As for binary search, it only allows max 8192 elements
128x128 > 8192
Try -y 1 -x 8192 you shall see huge speedup
Try size <= 8192 like -y 8192 -x 1 it shall runs with no problem, but negative performance improvement
All the code in examples emphasize data reuse so we shall see huge performance improvement for matrix multiplication, binary search of that in the example, that's what I conclude
For application that has high rate data reuse, ATI hardware is da' best