AnsweredAssumed Answered

clAmdFft 1.8.269 on multiple GPUs

Question asked by reuter on Jun 26, 2012
Latest reply on Jun 27, 2012 by bragadeesh



I'm trying to use the clAmdFft (v1.8.269) in a Multi-GPU environment as part of a bigger project. That causes two troubles:


1. It seems the FFT calculates wrong values in Single Precision on Nvidia GPUs (tested on GTX480, GTX580 and Tesla C2050). Double Precision works fine in the newest release.


2. The usage of multiple GPUs doesn't work on Nvidia:

To do so I set up my OpenCL-Contexts such that they contain only one device each. Next I call the setup-functionalities of clAmdFft and start separate threads for each device using Boost.Thread. Each Thread then loads some data from files, does some preprocessing in own kernels, then creates a plan, enqueues the transform (working on buffers from the previous computations), destroys the plan and writes the data to output files. This is done multiple times per thread.


It works on multiple Cayman-/Radeon 6970-Cards (tested from 1 up to 3 GPUs). However on multiple Nvidia GPUs it fails, stating

     OPENCL_V< CLFFT_INVALID_MEM_OBJECT > (1133): clSetKernelArg failed

     OPENCL_V< CLFFT_INVALID_MEM_OBJECT > (376): clAmdFftEnqueueTransform for row failed

However, running it on a single GPU works just fine, also in combination with Boost.Threads. Running own kernels (instead of the FFT) on multiple Nvidia GPUs also works fine. Tested again on multiple Tesla C2050 or mixed with an additional GTX480.


I checked the contexts used for the buffers, ensuring that everywhere the same context is used. This is true. Also tried it on different devices (also the single GPU version to ensure, that not always the first device is used). Replacing the FFT with an own kernel using exactly the same buffers also works. So it seems the problem is caused by one of the additional arguments passed to the FFT kernels, not the FFT-data itself?!


I'm aware of the fact that the current version is still a beta release - however this is the only version where at least double precision FFTs are possible on AMD and Nvidia GPUs.


I know you guys are not keen on optimizing your software on a different vendors hardware, but since OpenCL is a platform-independent standard, I think it would be great to have it working also on other vendors devices. Maybe you got a hint?


Thanks a lot,