My simple WebGL (OpenGL ES) simulation for N=1024,...8192
is rather fast on Radeon HD 4870 (10^9 interactions per second) but about 10 times slower than other (OpenGL, Brook, Cuda based) realizations. Is it possible to detect the bottleneck in this 10kb script? E.g. WebGL realization or too direct algorithm. I'm collecting my "WebGL and simulations on GPU" notes at
Thanks to Lars Nyland I've got 400 Gflops on Radeon HD 4870 (20 times speedup) after replacing 4096x1 textures by 64x64 ones. It is similar to the CAL based simulations and performance may be twiced by the loop-unrolling technique. N-Body is an amazing but a little special application.