minimum time cost of a kernel call

Discussion created by dukeleto on Jul 9, 2009
Latest reply on Jul 18, 2009 by riza.guntur
is there one?

Hello all,

I have just tried porting a relatively simple 1D problem submitted to me by a colleague, which, I thought at first sight, would be well suited to gpu resolution. The problem consists in solving a 1D equation giving pressure as a function of space and time, via finite differences. The particulariity here is that it needs to be time-stepped, for a *very* large number of periods.

Hence, I need to call my resolution kernel on a 1D vector, of length 3000, around 5e7 times.

The kernel I have written is fairly straight-forward, and has a large number of multiplications per memory access, so I think it should be well suited to gpu. However, I am getting *very* bad FLOPs.  The program performs a few hundred time steps (kernel calls) per second; this equates to substantially less than 1 GFLOP (on an HD4970).

Is this due to an incompressible overhead linked to kernel calling? If so, would switching to cal help? Or is my problem just not such a good match to gpu after all?