0 Replies Latest reply on Jan 7, 2009 11:01 PM by Ceq

    Slow constant parameters and implicit stream resize?

    Ceq
      I've modified "BROOK\samples\legacy\tests\sum" to test how C = A + 1 performs
      Each execution does 10000 iterations without performing streamRead/streamWrite
      My setup is Athlon X2 4850E, Radeon 4850, WinXP 64, VS2005


      Streams
      --------------------------------------------------------------------------------
      s1<1024, 1024> = 1, 2, 3...
      s2<1024, 1024> = 1, 1, 1...
      s3<1024, 1024>
      s4<1,1> = 1

      Kernels
      --------------------------------------------------------------------------------
      kernel void inc1(float a< >, float b< >, out float c< > ) { c = a + b; }
      kernel void inc2(float a< >, float b<1, 1>, out float c< > ) { c = a + b; }
      kernel void inc3(float a< >, float b, out float c< > ) { c = a + b; }
      kernel void inc4(float a< >, out float c< > ) { c = a + 1.0f; }

      Time
      --------------------------------------------------------------------------------
      inc1(s1, s2, s3); > 2.54s
      inc2(s1, s4, s3); > 8.85s
      inc3(s1, 1.0f, s3); > 5.49s
      inc4(s1, s3); > 2.46s


      1. Why does inc2 is so slow? I think implicit-resize was much faster in previous SDK (nearly as much as inc4)
      2. Why does inc3 doubles time? Using a constant parameter is that slow?
      3. Shouldn't inc4 be a little faster? It requires half the data compared to inc1

      I would appreciate any hints on these, thanks




      EDIT
      --------------------------------------------------------------------------------
      Using the stream.error() workaround on the output stream:
      inc1(s1, s2, s3); > 2.93s
      inc2(s1, s4, s3); > 8.81s
      inc3(s1, 1.0f, s3); > 2.86s
      inc4(s1, s3); > 2.78s

      Now inc1 and inc4 become a bit slower, but inc3 works fine, so
      looks like stream.error() bug also affects constant parameters, if
      so don't forget to fix it. On the other hand inc2 is still too slow.