1 Reply Latest reply on Jun 27, 2012 10:28 AM by bragadeesh

    clAmdFft 1.8.269 on multiple GPUs




      I'm trying to use the clAmdFft (v1.8.269) in a Multi-GPU environment as part of a bigger project. That causes two troubles:


      1. It seems the FFT calculates wrong values in Single Precision on Nvidia GPUs (tested on GTX480, GTX580 and Tesla C2050). Double Precision works fine in the newest release.


      2. The usage of multiple GPUs doesn't work on Nvidia:

      To do so I set up my OpenCL-Contexts such that they contain only one device each. Next I call the setup-functionalities of clAmdFft and start separate threads for each device using Boost.Thread. Each Thread then loads some data from files, does some preprocessing in own kernels, then creates a plan, enqueues the transform (working on buffers from the previous computations), destroys the plan and writes the data to output files. This is done multiple times per thread.


      It works on multiple Cayman-/Radeon 6970-Cards (tested from 1 up to 3 GPUs). However on multiple Nvidia GPUs it fails, stating

           OPENCL_V< CLFFT_INVALID_MEM_OBJECT > (1133): clSetKernelArg failed

           OPENCL_V< CLFFT_INVALID_MEM_OBJECT > (376): clAmdFftEnqueueTransform for row failed

      However, running it on a single GPU works just fine, also in combination with Boost.Threads. Running own kernels (instead of the FFT) on multiple Nvidia GPUs also works fine. Tested again on multiple Tesla C2050 or mixed with an additional GTX480.


      I checked the contexts used for the buffers, ensuring that everywhere the same context is used. This is true. Also tried it on different devices (also the single GPU version to ensure, that not always the first device is used). Replacing the FFT with an own kernel using exactly the same buffers also works. So it seems the problem is caused by one of the additional arguments passed to the FFT kernels, not the FFT-data itself?!


      I'm aware of the fact that the current version is still a beta release - however this is the only version where at least double precision FFTs are possible on AMD and Nvidia GPUs.


      I know you guys are not keen on optimizing your software on a different vendors hardware, but since OpenCL is a platform-independent standard, I think it would be great to have it working also on other vendors devices. Maybe you got a hint?


      Thanks a lot,



        • Re: clAmdFft 1.8.269 on multiple GPUs

          Hi Balthasar,


          First of all we are glad to note that our library works well for you on multi-GPU configurations with AMD hardware. Thanks for posting in detail about this. For the Nvidia hardware, we don't do anything special inside the library. Please remember that in order for this library to work properly, the OpenCL support provided by the platform vendor has to be thorough. Unfortunately, when running on Nvidia hardware you are relying on their OpenCL support (their compiler, runtime, etc) that AMD has no control over. We have seen forum posts before where users complained about things not working on Nvidia hardware, that ultimately was found to be due to defects in their OpenCL compiler. The problem you are running into could very well be issues in their runtime or other places. You would have to contact Nvidia about this and resolve.