0 Replies Latest reply on Jan 13, 2013 1:12 PM by gugi

    Kernels periodically take too much time to execute?

    gugi

      Hi everyone,

      I've got a problem with kernels executing way longer than they should from time to time.

      Basically I'm integrating a PDE using the pseudospectral method, meaning I got a loop and in each iteration I enqueue a bunch of custom kernels together with some forward/backward FFT transformations.

       

      When I profile the application (application trace, using CodeXL 1.0.2409.0) I get for example the result shown in the attached picture. Every x'th kernel takes way longer to execute than the kernels before. Take for example the fft_fwd one. Afterwards come several more kernels and forward ffts including basically the same fft_fwd again (but 1 iteration further), which all take significantly less time to execute, until another kernel suddenly requires way more time than before (in the example: calc_nonLin_n). Afterwards several iterations are OK again.

       

      Any ideas what might be causing this behavior? Any ideas how I could optimize the attached kernels in order to prevent it?

       

       

       

      I run everything on a HD5850, latest beta driver. Platform version: AMD-APP (1084.2) (according to clinfo). AMD FFT library: 1.8.239. Windows 7 64-Bit, Visual Studio 2010 (C++).

       

      My global work size is 256x64 (for the custom kernels), the local size is set to NULL. The FFT-library does real-to-complex and complex-to-real transformations of 256x64-matrices. The custom kernels only do some element-wise matrix multiplications. I attached both the kernel-sources and the host-source code which enqueues all kernels in 1 time step (the function is called in a loop).