Archives Discussions

Ceq · ‎01-07-2009

I've modified "BROOK\samples\legacy\tests\sum" to test how C = A + 1 performs
Each execution does 10000 iterations without performing streamRead/streamWrite
My setup is Athlon X2 4850E, Radeon 4850, WinXP 64, VS2005

Streams
--------------------------------------------------------------------------------
s1<1024, 1024> = 1, 2, 3...
s2<1024, 1024> = 1, 1, 1...
s3<1024, 1024>
s4<1,1> = 1

Kernels
--------------------------------------------------------------------------------
kernel void inc1(float a< >, float b< >, out float c< > ) { c = a + b; }
kernel void inc2(float a< >, float b<1, 1>, out float c< > ) { c = a + b; }
kernel void inc3(float a< >, float b, out float c< > ) { c = a + b; }
kernel void inc4(float a< >, out float c< > ) { c = a + 1.0f; }

Time
--------------------------------------------------------------------------------
inc1(s1, s2, s3); > 2.54s
inc2(s1, s4, s3); > 8.85s
inc3(s1, 1.0f, s3); > 5.49s
inc4(s1, s3); > 2.46s

1. Why does inc2 is so slow? I think implicit-resize was much faster in previous SDK (nearly as much as inc4)
2. Why does inc3 doubles time? Using a constant parameter is that slow?
3. Shouldn't inc4 be a little faster? It requires half the data compared to inc1

I would appreciate any hints on these, thanks

EDIT
--------------------------------------------------------------------------------
Using the stream.error() workaround on the output stream:
inc1(s1, s2, s3); > 2.93s
inc2(s1, s4, s3); > 8.81s
inc3(s1, 1.0f, s3); > 2.86s
inc4(s1, s3); > 2.78s

Now inc1 and inc4 become a bit slower, but inc3 works fine, so
looks like stream.error() bug also affects constant parameters, if
so don't forget to fix it. On the other hand inc2 is still too slow.

Archives Discussions

Slow constant parameters and implicit stream resize?