28 Replies Latest reply on Oct 28, 2013 8:47 AM by kknox

    OpenCL: clAmdFft (OpenCL FFT lib from AMD) on NVIDIA GPUs


      Has anyone experiences with running the OpenCL FFT library from AMD (http://developer.amd.com/libraries/appmathlibs/pages/default.aspx) on NVIDIA GPUs?

      I'm trying to port an existing algorithm from CUDA (with the most recent CUFFT) to OpenCL. The new code is running fine with an AMD GPU but not with my NVIDIA GPU. The NVIDIA GPU is recognized properly but the resulting array is zero all over without throwing any errors. By the way, the code runs also fine on an Intel Core i3 CPU. So my code seems to be fine.

      Any ideas?

        • Re: OpenCL: clAmdFft (OpenCL FFT lib from AMD) on NVIDIA GPUs

          Hi terman,


          Our clAmdFft library should in theory run on any OpenCL enabled device. But you should note that our testing has been primarily on AMD GPUs and to some extent CPUs. We have not specifically tested the library on Nvidia GPUs. Could we get more info on what kind of problem (precision, dimension, lengths, layouts etc) you are trying to run and what environment (OS, bitness, Nividia GPU card details) ?



            • Re: OpenCL: clAmdFft (OpenCL FFT lib from AMD) on NVIDIA GPUs

              Hi bragadeesh,

              thanks for your reply.


              I basically want to Fourier transform a quadratic array (aka. image) with power of two side length e.g. 1024 x 1024 px. I put this data in the real part of an std::complex<float> field and set the imaginary part to 0. The transform should happened in-place with interleaved layout and float precision.


              My environment is a Windows 7 Professional x64 OS and I'm using the Visual Studio C++ Professional IDE with it's build-in x86 compiler.

              The NVIDIA GPU is a GeForce GTX 560 Ti (MSI N560GTX-Ti Twin Frozr II/OC 1GB). The well working CPU is a Intel Core i3-2100 (2x3.1GHz) and than there is the Radeon HD 6850 (Sapphire Radeon HD 6850 1GB).

              I tried compiling the code against the newest OpenCL releases of AMD, NVIDIA and Intel, with the same results and have of cause the newest developer drivers installed.


              As I sad before, the AMD GPU and the Intel CPU produce the results I expected (The same as with FFTW3, KissFFT, CUFFT):



              But the NVIDIA GPU generated the following, obviously wrong, field:



              Here is my pretty basic sample code ...

              #include <stdio.h>
              #include <stdlib.h>
              #include <complex>
              #include <clAmdFft.h>
              #if defined (__APPLE__) || defined(MACOSX)
                  #include <OpenCL/opencl.h>
                  #include <CL/opencl.h>
              // Typedef for complex field objects
              using namespace std;
              typedef std::complex<float> cl_compl_flt; 
              int main(int argc, char* argv[])
                        cl_uint width = 1024, height = 1024;                                                  // Field dimensions
                        cl_uint cl_platformsN = 0;                                                                                // Platform count
                        cl_platform_id *cl_platformIDs = NULL;                                                  // IDs of OpenCL platforms
                  cl_uint cl_deviceCount = 0;                                                                                // Device count
                  cl_device_id *cl_devices = NULL;                                                            // Device IDs
                        cl_int cl_err = 0;                                                                                                    // Buffer for error informations
                        cl_context cl_dev_context;                                                                                // Context
                        cl_command_queue cl_queue;                                                                                // Queue
                        clAmdFftSetupData fftSetupData;                                                                      // FFT setup data
                        clAmdFftPlanHandle fftPlan;                                                                                // FFT plan
                        clAmdFftDim fftDim = CLFFT_2D;                                                                      // FFT dimension
                        size_t fftSize[2];                                                                                                    // FFT size
                                  fftSize[0] = width;
                                  fftSize[1] = height;
                        cl_mem d_data;                                                                                                              // Device level data
                        cl_compl_flt* h_src;                                                                                          // Host level input data
                        cl_compl_flt* h_res;                                                                                          // Host level output data
                        // Allocate host memory
                        h_src = (cl_compl_flt*)malloc(width*height*sizeof(cl_compl_flt));
                        h_res = (cl_compl_flt*)malloc(width*height*sizeof(cl_compl_flt));
                        // Get source field
                        createPinholeField( h_src, width, height, 5 );
                        // Get FFT version
                        checkCL( clAmdFftInitSetupData(&fftSetupData) );
                        printf("Using clAmdFft %u.%u.%u\n",fftSetupData.major,fftSetupData.minor,fftSetupData.patch);
                        // Get available platforms
                        checkCL( clGetPlatformIDs ( 0, NULL, &cl_platformsN));
                        cl_platformIDs = (cl_platform_id*) malloc( cl_platformsN * sizeof(cl_platform_id));
                        checkCL( clGetPlatformIDs( cl_platformsN, cl_platformIDs, NULL) );
                        // Loop over platforms
                        for( cl_uint i = 0; i < cl_platformsN; i++)
                                  // Get number of available devices for this platform
                                  checkCL( clGetDeviceIDs( cl_platformIDs[i], CL_DEVICE_TYPE_ALL, NULL, NULL, &cl_deviceCount));
                                  // Skip platform if no device available
                                  if(cl_deviceCount < 1)
                                  // Get available device IDs for this platform
                                  cl_devices = (cl_device_id*) malloc( cl_deviceCount * sizeof(cl_device_id)); 
                                  checkCL( clGetDeviceIDs( cl_platformIDs[i], CL_DEVICE_TYPE_ALL, cl_deviceCount, cl_devices, NULL));
                                  // Print platform name
                                  char platform_name[1024];
                                  checkCL( clGetPlatformInfo( cl_platformIDs[i], CL_PLATFORM_NAME, 1024, &platform_name, NULL) );
                                  printf("\nCompute using OpenCl platfrom #%i [ %s ]\n", i,platform_name);
                                  // Loop over devices
                                  for( cl_uint j = 0; j < cl_deviceCount; j++)
                                            // Print device name and type
                                            cl_device_type device_type;
                                            char device_name[1024];
                                            checkCL( clGetDeviceInfo( cl_devices[j], CL_DEVICE_NAME, 1024, &device_name, NULL) );
                                            checkCL( clGetDeviceInfo( cl_devices[j],CL_DEVICE_TYPE, sizeof(cl_device_type), &device_type, NULL) );
                                            printf("\n\tUsing OpenCl device #%i [ %s -- %s ]\n", j, device_name, getDevTypeString(device_type));
                                            // Create OpenCL context
                                            cl_context_properties cps[3] = 
                                            cl_dev_context = clCreateContext( cps, cl_deviceCount, cl_devices, NULL, NULL, &cl_err);
                                            checkCL( cl_err);
                                            // Create command queue
                                            cl_queue = clCreateCommandQueue( cl_dev_context, cl_devices[j], CL_QUEUE_PROFILING_ENABLE, &cl_err);
                                            checkCL( cl_err);
                                            // Create device buffer
                                            d_data = clCreateBuffer( cl_dev_context, CL_MEM_READ_WRITE, width*height*sizeof(cl_compl_flt), NULL, &cl_err);
                                            checkCL( cl_err);
                                            // Setup FFT
                                            checkCL( clAmdFftSetup(&fftSetupData) );
                                            // Create FFT plan
                                            checkCL( clAmdFftCreateDefaultPlan( &fftPlan, cl_dev_context, fftDim, fftSize) );
                                            // Copy data from host to device
                                            clEnqueueWriteBuffer( cl_queue, d_data, CL_TRUE, 0, width*height*sizeof(cl_compl_flt), h_src, 0, NULL, NULL);
                                            // Execute FFT
                                            checkCL( clAmdFftEnqueueTransform( fftPlan, CLFFT_FORWARD, 1, &cl_queue, 0, NULL, NULL, &d_data, NULL, NULL) );
                                            clFinish( cl_queue);
                                            // Copy result from device to host
                                            checkCL( clEnqueueReadBuffer(cl_queue, d_data, CL_TRUE, 0, width*height*sizeof(cl_compl_flt), h_res, 0, NULL, NULL) );
                                            clFinish( cl_queue);
                                            // Save result
                                            char filename[512];
                                            sprintf( filename, "raw/result_%u_%u_in.raw",i,j);
                                            printf("\tSave result to \"%s\" ", filename);
                                            saveRawData( h_res, filename, width, height, true);
                                            // Free FFT plan
                                            checkCL( clAmdFftDestroyPlan( &fftPlan) );
                                            // Free FFT
                                            checkCL( clAmdFftTeardown() );
                                            // Free device memory
                                            checkCL( clReleaseMemObject(d_data) );
                                            // Release OpenCL context and queue
                                            checkCL( clReleaseCommandQueue( cl_queue ) );
                                            checkCL( clReleaseContext( cl_dev_context) );
                                  // Free OpenCL devices
                                  free( cl_devices);
                        free( h_src);
                        free( h_res);
                        printf("\n\nPress any key ...");
                        return 0;


              and the additional used functions ...

              // Generate a pinhole
              void createPinholeField( cl_compl_flt* data, cl_uint width, cl_uint height, cl_uint radius)
                                  data = (cl_compl_flt*)malloc(width*height*sizeof(cl_compl_flt));
                        if(radius < 1)
                                  radius = (width>height)?height/2:width/2;
                        cl_float min_val = 0.0f;
                        cl_float max_val = 255.0f;
                        for(cl_uint y = 0; y < height; y++)
                      for(cl_uint x = 0; x < width; x++)
                                            if ( ceil( sqrt( pow(x-width/2., 2.) + pow(y-height/2., 2.) )) <= radius )
              // Save a cl_compl_flt array as an unsigned char raw image file
              void saveRawData( cl_compl_flt* char_array, const char* filepath, cl_uint width, cl_uint height, bool print_minmax )
                        cl_float* abs_v = (cl_float*) malloc(width*height*sizeof(cl_float)); 
                        for( cl_uint i = 0; i < width*height; i++)
                                  abs_v[i] = abs(char_array[i]);
                        cl_float min = abs_v[0];
                        cl_float max = abs_v[0];
                        for( cl_uint i = 1; i < width*height; i++)
                                  if( abs_v[i] < min)
                                            min = abs_v[i];
                                  if( abs_v[i] > max)
                                            max = abs_v[i];
                        if( print_minmax) 
                                  printf(" [min=%f , max=%f] ",min,max);
                        max *= .01f;
                        cl_uchar* temp = (cl_uchar*) malloc(width*height*sizeof(cl_uchar));
                        for( cl_uint i = 0; i < width*height; i++)
                                  temp[i] = 255*(cl_uchar)(( (cl_float)abs_v[i] - min) / ( max-min ));
                        FILE *pFile = NULL;
              // Check functions that return OpenCL error IDs.
              bool checkCL( cl_int oclErrorCode)
                        if( oclErrorCode == CL_SUCCESS)
                                  return true;
                                  printf("\n\nAn OpenCL related error occured!\nError ID #%d\nPress ENTER to exit the program...\n\n", oclErrorCode);
                                  exit( oclErrorCode);
                                  return false;
              // Get device type as string
              char* getDevTypeString(cl_device_type type)
                        case CL_DEVICE_TYPE_CPU:
                                  return "CPU";
                        case CL_DEVICE_TYPE_GPU:
                                  return "GPU";
                        case CL_DEVICE_TYPE_ACCELERATOR:
                                  return "ACCELERATOR";
                                  return "DEFAULT";


              The code produces the following console output on my system with the NVIDIA card installed:



              I hope this helps to narrow down the problem.

                • Re: OpenCL: clAmdFft (OpenCL FFT lib from AMD) on NVIDIA GPUs

                  Thanks terman, for the detailed reply and code. We would have to do some investigation on our side to see why this is not working as expected. Could you run 'clinfo.exe' (that came with AMD APP) on your system and provide its output here?

                    • Re: OpenCL: clAmdFft (OpenCL FFT lib from AMD) on NVIDIA GPUs

                      Hi bragadeesh,

                      here is the output of clinfo. I hope it helps.



                        • Re: OpenCL: clAmdFft (OpenCL FFT lib from AMD) on NVIDIA GPUs

                          Hi terman,


                          I have made sure that the FFT kernels we generate in the library are valid when running on Nvidia GPU. The parameters getting passed into OpenCL API functions (inside the library) are valid too. So from a library standpoint things are behaving as expected. It looks like the problem is in Nvidia's OpenCL runtime. I think they have trouble loading constant tables of type 'float2'. In our FFT OpenCL kernels, we use constant tables to store twiddle factors for use in the computation. Our kernels use tables of type 'float2'. See example below:


                          __constant float2 twiddles[2] = { {1.0f, 0.0f}, {1.0, 0.0f} };


                          And it works correctly on AMD platforms. But it does not work on Nvidia's platform. In a small experimental kernel, I confirmed this. When I tried using tables of type 'float' then things are working ok. So this is a case of Nvidia's OpenCL platform not handling a valid OpenCL kernel. You would have to contact Nvidia or post in their forums to get this resolved.

                            • Re: OpenCL: clAmdFft (OpenCL FFT lib from AMD) on NVIDIA GPUs

                              Hi bragadeesh,


                              first of all thanks a lot for your help.


                              I will try to get some support from NVIDIA and maybe come back to you later.


                              Again, thanks.

                              • Re: OpenCL: clAmdFft (OpenCL FFT lib from AMD) on NVIDIA GPUs

                                According to the OpenCL specification a vector literal is written in the form:


                                (float2)(1.0f, 0.0f)


                                See section "6.1.6 Vector Literals" in the specification. So I think the kernels would be valid if the mentioned constant tables look like:


                                __constant float2 twiddles[2] = { (float2)(1.0f, 0.0f), (float2)(1.0, 0.0f) };

                                  • Re: OpenCL: clAmdFft (OpenCL FFT lib from AMD) on NVIDIA GPUs

                                    There seems to be one more issue with accessing program scope __constant data in Nvidia's OpenCL implementation (see https://devtalk.nvidia.com/default/topic/524095). I noticed that after replacing the "__constant cb_t *cb" kernel arguments by "__global cb_t *cb", the code gives the same results on Nvidia and AMD GPUs, so this seems to be a viable workaround. I only tested for a small 2D FFT, and there is only a single access to the "cb" pointer, i.e., putting it into global instead of constant address space has no impact on performance. I'm not sure about other FFT variants, though.


                                    Is it possible with clAmdFft to modify the kernel code before it gets compiled? The documentation only tells about dumping the code, which I did for the above investigations.


                                    While it is certainly Nvidia's task to get this fixed, it would be great if AMD's FFT library could be used on different platforms. The only alternative is cuFFT, which currently rules out AMD GPUs. Maybe you can find a portable way to implement the FFT based on the above information.


                                    Thanks & kind regards,


                                      • Re: OpenCL: clAmdFft (OpenCL FFT lib from AMD) on NVIDIA GPUs

                                        Hi Markus,


                                        Unfortunately, this is something we cannot change in our library. It is valid OpenCL code and the compiler can take advantage of hardware features by identifying the __constant keyword. We also need it for performance reasons across range of AMD GPus. I see that you have raised this issue with Nvidia, please see if you can follow up with them and get it fixed.



                                          • Re: OpenCL: clAmdFft (OpenCL FFT lib from AMD) on NVIDIA GPUs

                                            Thanks for your reply! Now I was curious to see actual numbers, so I did a performance comparison for a 2048x2048 real to interleaved Hermitian FFT (see below for details on the evaluation) and obtained the following results (mean +/- standard deviation in milliseconds):

                                            • AMD Radeon HD7950 with "__constant cb_t *cb": 8.07 +/- 0.101 ms
                                            • AMD Radeon HD7950 with "__global cb_t *cb": 8.20 +/- 0.126 ms
                                            • NVIDIA GeForce GTX480 with "__global cb_t *cb": 3.91 +/- 0.012 ms


                                            The difference between __constant and __global on AMD hardware is hardly significant, while it is evident that the GTX480 (which is ~2 years older than the HD7950) is more than twice as fast. So AMD did a really good job in writing highly optimized code for NVIDIA hardware :-) A comparison with cuFFT would also be interesting, but I didn't yet rewrite the testing code for CUDA.


                                            The measurements were performed under Windows 7 as follows: all data were transferred to GPU memory, and the FFT plans "baken" before the first computation. The actual FFT timing measurement followed this procedure:

                                            • clFinish()
                                            • timerStart();
                                            • clAmdFftEnqueueTransform();
                                            • clFinish()
                                            • timerStop();

                                            timerStart()/Stop() make use of Window's high resolution performance counter. The measurement was repeated 100 times, from which 25 outliers were discarded to cancel the effect of other system activity. The mean and standard deviation of the remaining values are listed above. The AMD and NVIDIA test were run on different machines, but this shouldn't make a difference since I believe no data is transferred between host and device in the clAmdFftEnqueueTransform() call.


                                            Is there a more accurate way to measure FFT performance? The events which are optionally returned by clAmdFftEnqueueTransform() can be used to query the performance of the last kernel, but not of all kernels launched by the function.


                                            Is there anything else I can do to improve FFT performance on AMD hardware? Or did I just select a problem size which doesn't map well on the HD7950?


                                            Thanks & kind regards,


                                          • Re: OpenCL: clAmdFft (OpenCL FFT lib from AMD) on NVIDIA GPUs

                                            Hi Grabner,

                                            Can I ask where you were able to change the kernel input vector argument? I did a "grep -r "__constant cb_t *cb" *" in the install directory for good measure and found nothing. I was under the impression that the clAmdFft came in a precompiled .so and that was that...


                                            I would reaally appreciate a prompt answer. I'm having a hard time getting some image processing code to work and this would really help out. Thanks.

                              • Re: OpenCL: clAmdFft (OpenCL FFT lib from AMD) on NVIDIA GPUs

                                You could use Apple's oclFFT implementation that available in sources from very beginning of OpenCL: http://developer.apple.com/library/mac/#samplecode/OpenCL_FFT/Introduction/Intro.html

                                It works on NV and on AMD OK.


                                Also, did anyone compare oclFFT performance vs AMD's own library implementation ?


                                Initial AMD implementations were considerably slower than oclFFT so at least worth to check if something changed since those times...

                                  • Re: OpenCL: clAmdFft (OpenCL FFT lib from AMD) on NVIDIA GPUs

                                    I'm using Apple's oclFFT and I'm having serious artifacts on my AMD v7900. It works fine on my NVIDIA 9800GT... and it has no support for CPU's.


                                    For now, the only solution I've found to have my application actually working as "multiplatform and heterogeneous", is using clAmdFft for AMD GPUs and CPU (windows+linux), and Apple's clFFT for NVIDIA (windows+linux+osx). So I can't do OpenCL CPU FFTs on an OSX machine...

                                      • Re: OpenCL: clAmdFft (OpenCL FFT lib from AMD) on NVIDIA GPUs

                                        Hi branyman,


                                        I also want to use Apple's oclFFT with my NVidia GPU. Do you still have problems with the C++ bindings ?

                                        Also for making it run on Linux, did you just modify the Makefile?


                                        Best regards,


                                          • Re: OpenCL: clAmdFft (OpenCL FFT lib from AMD) on NVIDIA GPUs

                                            Hi ash!


                                            As far as I know, the Apple's oclFFT still in the same version, so I don't expect improvements in that way soon.


                                            Yes, I only modify the Makefile

                                              • Re: OpenCL: clAmdFft (OpenCL FFT lib from AMD) on NVIDIA GPUs


                                                So is it a bad idea to use this with C++ bindings?

                                                From the source code, it's based on the C version so I'm quite afraid it'll lead to some problems.

                                                I also tried to modify the makefile it but I failed. I couldn't run the sample on Linux.

                                                Could you help to make use of this library please?

                                                  • Re: OpenCL: clAmdFft (OpenCL FFT lib from AMD) on NVIDIA GPUs

                                                    But, what's your problems? Did you mention them?


                                                    I don't use C++ bindings, sorry.

                                                      • Re: OpenCL: clAmdFft (OpenCL FFT lib from AMD) on NVIDIA GPUs

                                                        Ho such a pity you don't use C++ bindings It would have been a great help.

                                                        Oups, actually in main.cpp file seems like a lot of things are missing. So after changing the makefile and the includes in clFFT.h

                                                        and some other changes. It compiles but when I launch the program it fails at clGetDeviceIDs. Were you able to run the test or did you

                                                        just take the all the files except the main for your project?

                                                          • Re: Re: OpenCL: clAmdFft (OpenCL FFT lib from AMD) on NVIDIA GPUs

                                                            I left the main.cpp problem and just added the modified Apple's files in my project. It compiles just fine but at the execution I have the following error :

                                                            undefined symbol: _Z5FFT1DP11cl_fft_plan12kernel_dir_t


                                                            Here is the very simple code taken from the forum itself :

                                                            cl::Context context(devices, NULL, NULL, NULL);

                                                                cl::CommandQueue command_queue(context, dev, 0);


                                                                std::vector <cl::Platform> platforms;

                                                                cl_context myContext = context();


                                                                clFFT_Dim3 n;

                                                                clFFT_Plan plan;

                                                                cl_uint plan_length;

                                                                cl_int err;

                                                                n.x = 1024;

                                                                n.y = 1;

                                                                n.z = 1;

                                                                cout << "Creating plan" << endl;

                                                                plan = clFFT_CreatePlan((cl_context) myContext, n, clFFT_1D,

                                                               clFFT_InterleavedComplexFormat, &err);


                                                            Hope somebody could help. Thanks in advance.



                                                              • Re: OpenCL: clAmdFft (OpenCL FFT lib from AMD) on NVIDIA GPUs

                                                                To use the code as a library, you have to ignore the main.cpp, right!


                                                                That's my makefile:


                                                                ifdef BUILD_WITH_ATF

                                                                ATF = -framework ATF

                                                                USE_ATF = -DUSE_ATF




                                                                SRCS = fft_execute.cpp fft_setup.cpp fft_kernelstring.cpp

                                                                HEADERS = procs.h fft_internal.h fft_base_kernels.h clFFT.h

                                                                TARGET = libclFFT.a

                                                                COMPILERFLAGS = -D_LINUX -c -g -Wall -O3 -I../opencl_amd/include

                                                                CFLAGS = $(COMPILERFLAGS) ${RC_CFLAGS} ${USE_ATF}

                                                                CC = g++

                                                                LIBRARIES = -L../opencl_amd/lib/x86_64 -lOpenCL ${RC_CFLAGS} ${ATF}





                                                                OBJECTS = fft_execute.o fft_setup.o fft_kernelstring.o

                                                                TARGETOBJECT =

                                                                all: $(TARGET)



                                                                $(OBJECTS): $(SRCS) $(HEADERS)

                                                                  $(CC) $(CFLAGS) $(SRCS)



                                                                $(TARGET): $(OBJECTS)

                                                                  ar rc ./lib/$@  $(OBJECTS)




                                                                  rm -f $(TARGET) $(OBJECTS)




                                                                  @echo The target \"$@\" does not exist in Makefile.




                                                                Did you linked the libclFFT.a library the makefile generates to your test program?

                                                                  • Re: Re: OpenCL: clAmdFft (OpenCL FFT lib from AMD) on NVIDIA GPUs



                                                                    Thanks for you sharing. I did almost the same changes but you're right I have some link problems that I have to solve.

                                                                    But before going further I just removed the code from main.cpp file and put mine in it. Then I wanted to test the FFT on a really 1D simple signal but I got completely wrong results. If you have time could look at my code. I'm pretty sure it's a silly mistake but I have been going in circle for hours without finding my mistakes.

                                                                    i have joined the main.cpp file.


                                                                    Here is the output :

                                                                    name device: GeForce GTX 650

                                                                    Creating plan


















                                                                    From FFTW I should get a peak on the first value and 0 everywhere else.

                                                                    Hope you could help.


                                                                    Best regards,


                                                                      • Re: Re: OpenCL: clAmdFft (OpenCL FFT lib from AMD) on NVIDIA GPUs

                                                                        That results looks like uninitialized memory, I think I found the bug:


                                                                        cmd_queue.enqueueWriteBuffer(input_buffer, CL_TRUE, 0, 16 * sizeof(float),NULL);


                                                                        You are creating the input buffer on the GPU, but with no data, so opencl will do the allocation only. It should work that way:


                                                                        cmd_queue.enqueueWriteBuffer(input_buffer, CL_TRUE, 0, 16 * sizeof(float),input);


                                                                        I'm not sure that


                                                                        err = clFFT_ExecuteInterleaved(queue, plan, 4, clFFT_Forward, data_in,data_out, 0, NULL, NULL);


                                                                        will work, try using 1 instead of 4 if it still working bad.

                                                                          • Re: Re: OpenCL: clAmdFft (OpenCL FFT lib from AMD) on NVIDIA GPUs

                                                                            Ouch that's a really silly mistake :!!!!. I corrected it thanks! but it still gives some weird results. Now the results are around power 16.



















                                                                            It's really weird! Hope you could help me again

                                                                            And moreover, I also tried with clAmdFft and I have weird results too. I don't understand though. Is the signal incorrect? It was working with FFTW so I don't get it.

                                                                            EDIT :

                                                                            With clAmdFft it was working fine on CPU. Seems that running clAmdFft on Nvidia doesn't work for me as well.

                                                                            For the apple's implementation, the results keep changing each time I execute the program. Can somebody help? Because besides Apple's FFT It doesn't seem to exist

                                                                            another solution yet for NVIDIA.


                                                                            Thanks anyway.

                                                                            Best regards,


                                                      • Re: OpenCL: clAmdFft (OpenCL FFT lib from AMD) on NVIDIA GPUs

                                                        I've seen this thread referenced from many different sources, even given it's age.  I wanted to make a note here that a code patch has been merged into the /develop branch of the open source version of clFFT to work around what we believe to be a bug in the runtime stack.



                                                        clFFT should work on Nvidia devices once more.