kernel calls are slow

Discussion created by josopait on Jun 27, 2008
Latest reply on Jun 30, 2008 by ryta1203

Consider the following test program:


kernel void copy(float a<>, out float b<>
  b = a;

int main()
  float a<10>;
  float b<10>;
  int t;

  for (t=0; t<100000; ++t)
      copy(a, b);

It simply copies stream a to stream b, 100000 times. This seems like an easy task to do, but it takes 6.5 seconds to run on my computer. That's 65 microseconds for each call to the copy kernel.

Why is it so slow? What happens during a kernel call? Isn't it simply a matter of pushing a couple of instructions to the gpu? Can one speed this up?

Thanks for any help,