cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

gdebrecz
Journeyman III

Apple's FFT on AMD cards with C++ OpenCL API

Hi,

I'm trying to run Apple's clFFT on AMD cards using the OpenCL C++ API, to

create the platform, context and devices, queues. It seems that it works only

for the C API, but not for the C++.

On Nvidia Cards it works both with C and C++ API.

Here it is the C way of creating the context,

-----------------------------------------------

    cl_uint numPlatforms;

    cl_platform_id platform = NULL;

    err = clGetPlatformIDs(0, NULL, &numPlatforms);

    cout << "clGetPlatformIDS status : " << err << endl;

    if (0 < numPlatforms) {

         cl_platform_id* platforms = new cl_platform_id[numPlatforms];

         status   = clGetPlatformIDs(numPlatforms, platforms, NULL);

        cout << "clGetPlatformIDs status : " << err << endl;

        platform = platforms[0];

    }

    cl_device_id device_ids[16];

    unsigned int num_devices;

    err = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 16, device_ids, &num_devices);

    cout << "clGetDeviceIDs err: " << err << endl;

    cl_context_properties ctxProps[3] = {CL_CONTEXT_PLATFORM, (cl_context_properties) platform$

    cl_context myContext = clCreateContext(ctxProps, 1, device_ids, NULL, NULL, &err);

    cout << "clCreateContext err: " << err << endl;

-----------------------------------------------------------------

and this is the C++

------------------------------------------------------------

std::vector <cl::Platform> platforms;

    err = cl::Platform::get(&platforms);

    cl_context_properties context_properties[3] = {

          CL_CONTEXT_PLATFORM, (cl_context_properties) platforms[0](), 0 };

    cl::Context myCtx = cl::Context(CL_DEVICE_TYPE_ALL,

          context_properties, NULL, NULL, &err);

   cl_context myContext = myCtx();

------------------------------------------------------------------

after that I call clFFT in this way:

-------------------------

clFFT_Dim3 n;

    clFFT_Plan plan;

    cl_uint plan_length;

    n.x = 1024;

    n.y = 1;

    n.z = 1;

cout << "Creating plan" << endl;

plan = clFFT_CreatePlan((cl_context) myContext, n,
                            clFFT_1D, clFFT_InterleavedComplexFormat, &err);

A couple of question:

   - When I use the C++ API the plan creation hangs and the CPU is on 100% load... anybody has similar experiences ?

   - Does anybody using clFFT with C++ API ?

  

thanks,

Gergely

Tags (4)
0 Likes
13 Replies
binying
Challenger

Re: Apple's FFT on AMD cards with C++ OpenCL API

Apple's clFFT on AMD card? It might be the source of trouble.

😛

0 Likes
gdebrecz
Journeyman III

Re: Apple's FFT on AMD cards with C++ OpenCL API

Well, since it is much faster than clAMDFFT... I'd like to use it.

0 Likes
bragadeesh
Staff
Staff

Re: Apple's FFT on AMD cards with C++ OpenCL API

Much faster? Can you elaborate on that? What problem are you running and what card did you measure this on? What version of apple FFT and clAmdFft are you using? I am particularly interested in your performance comparison statement. AMD's FFT library is very competitive in terms of performance at many problem sizes.

0 Likes
gdebrecz
Journeyman III

Re: Apple's FFT on AMD cards with C++ OpenCL API

Hi thanks for your answer ! Concerning the speed these are my measurement maybe I was mistaken,

but I'm sure that you also have these numbers....

However I would be interested for an answer to my original question ! Could you comment on that ? There could be

any differemce between contexts

  a.) Created with the C++ API as cl::Context and the devices are not listed explicitely just specified as CL_DEVICE_TYPE_ALL ?

b.) A context created with the C API as cl_context, where you have the devices explicitely enumerated.

thanks a lot,

Gergely

0 Likes
LeeHowes
Staff
Staff

Re: Apple's FFT on AMD cards with C++ OpenCL API

The obvious difference is that in C version you asked for only GPUs but for C++ you asked for ALL, which would give you the CPU device too (and presumably wouldn't on NVIDIA's platform).

0 Likes
gdebrecz
Journeyman III

Re: Apple's FFT on AMD cards with C++ OpenCL API

Dear Lee,

Thanks for your answer. Indeed in this specific example I've copied there is this difference you correctly spotted,

however I've been trying all the possible compbinations, i.e. also GPU-s only in both C and C++ APi with the

same result. So probably this is not the rason.

Gergo

0 Likes
yurtesen
Miniboss

Re: Apple's FFT on AMD cards with C++ OpenCL API

gdebrecz wrote:

Hi thanks for your answer ! Concerning the speed these are my measurement maybe I was mistaken,

but I'm sure that you also have these numbers....

Would you care to share your measurements and how you measured them? It would be quite interesting to see...

0 Likes
gdebrecz
Journeyman III

Re: Apple's FFT on AMD cards with C++ OpenCL API

OK, I'll try ti re-run my benchmarks and will post you. However could you please comment on my original question:

There could be any difference between context created using the C api and context returned by a cl::context() C++ API

context, (assuming they have identical contextpropeties set in advance) ?

thanks a lot,

Gergely

0 Likes
yurtesen
Miniboss

Re: Apple's FFT on AMD cards with C++ OpenCL API

Coincidentally, I was just looking at Apple's FFT recently and it does not seem to perform well or correctly on CPU devices.  Your problem is probably not related to a difference between C/C++ APIs.

Apple's FFT library's plan creation, wont work if you only have CPU as a device. (it returns an invalid context error and this is hard coded in the implementation). I guess it might somewhat get confused if you have CPU + GPU in your context. I recently tried to change offending code to accept CPU device type and got strange results. It is by design... it is not suppose to work on CPU devices (as far as I can see). You should use GPU devices only if you want to use Apple FFT or else use clAmdFFT library.

It is probably your code which does not catch the error created by the plan creation. You should check for CL_SUCCESS != err after calling it, you will find 'err' stored -34 invalid context.

This probably works on Nvidia devices, because you will only get GPU devices if you use Nvidia OpenCL platform. Therefore it will not spit out error.

I recommend sticking to AMD's FFT libraries if you want to run both on CPU and GPU. I also recommend using AMD hardware in this case since it is probably optimized for AMD cards. I did not use FFT library directly, but I ran some tests on Tesla cards and Tahiti cards with clAmdBlas. clAmdBlas was superior to cuBlas (sure AMD's GCN Tahitii is better than Nvidia Tesla solutions hardware-wise but still). I would  be surprised if clAmdFFT does not function well based on my experience with clAmdBlas

0 Likes