cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

ash
Journeyman III

How to implement cl_khr_icd?

Hi,

I want to run my application on both intel CPU and nvidia GPU. As I read some other posts, I clearly understood that I need 2 SDK for this configuration so I chose : AMD SDK and NVIDIA SDK. A part from the "FATA error no flgrx found", when I run my application it can find the 2 devices.

But how can I load dynamically the good library? I heard of the extension cl_khr_icd but I can't manage to understand ho to use it. Can anybody help please?

Best regards,

Jacq

0 Likes
71 Replies
himanshu_gautam
Grandmaster

End users need not worry about ICD. You can link to one OpenCL runtime and that will load the other runtimes and daisy-chain them transparently.

When you query the platforms -- you should get both platforms listed. Then, everything is fine.

Just select your platform, create the context and get going...

Hi,

Thanks for your quick reply. Sorry, I didn't explain well my case. Actually the application is for some clients ( doing an internship in a company ). So I don't know in advance if the client has installed both SDK. How can I find which one is installed? Because let's say it has only Nvidia GPU then I should load the libOpenCL.so from Nvidia, how will the program know? And also when I look for platforms with Nvidia it doesn't see my CPU only the GPU is found. Whereas the AMD SDK can see both.

My question might be dumb so please excuse my ignorance, but it's kind of confusing for me.

0 Likes

Hi,

Just link your app against libOpenCL.so on your local machine and ship the app.

When your client runs the code, it will try to load "OpenCL" library. Whatever library (nvidia or amd or intel etc..) is in the LD_LIBRARY_PATH (or) in the standard system search path will be loaded. Now, this library (by the virtue of ICD) will load all other installed platforms transparently. Your app can query the platforms and get going.

AMD as a company ships both AMD CPUs as well as AMD Radeon GPUs. AMD's OpenCL SDK will support both CPU and GPU devices. However, companies like NVIDIA who sell only GPUs will expose only the GPU device. There is nothing wrong with it.

Your app should find out the platforms installed, whether the devices are GPUs or CPUs, how many CUs they have etc... and decide what devices it should work on.

Note that: an OpenCL context can only be formed out of devices from one platform.

If you intend to work on multiple platforms, you need to separately create contexts and partition your problem manually among the different platforms.

Hope this is clear.

0 Likes

Just a quick addition:

Test your app before you ship to the client.

And, Please inform the client to make sure that relevant OpenCL libraries are in the LD_LIBRARY_PATH (or) system search path. Otherwise, they will face "failed to load shared libraries" error.

Just be aware.

This is a classic "redistribution" problem. You will have to just take your app and run on a different machine and make sure it runs when all libraries are found.

0 Likes

Thanks a lot for your reply, it's much more clearer. I just still have one doubt : if by linking with one library openCL it will find other all others platforms. Then I don't understand why by putting the NVidia library in the LIBRARY_PATH the AMD samples don't work anymore. Is is because of the openCL version ( 1.2 vS 1.1) ? What should I do to make them work again with this setup?

0 Likes

Mixing NVIDIA and AMD platforms should technically work. Can you tell which sample did not work?

Mixing 1.1 and 1.2 is a problem if you are using 1.2 APIs. For example: "clinfo" might seg-fault if you do that.

Some samples may not work

You can do 2 things here:

1. Run a simple sample that does not use 1.2 API

2. Write your own application to query and list the platforms that you have.

0 Likes

Yes it's true, clinfo gave me en error when I first linked with AMD SDK and now it prints with Nvidia:

clinfo: relocation error: clinfo: symbol clRetainDevice, version OPENCL_1.2 not defined in file libOpenCL.so.1 with link time reference

For the sampels,actually you're right :

- some  don't run because of the openCL version 1.2 such as  :

GaussianNoise, DeviceFission, HDRToneMapping, ImageOverlap, MatrixMulDouble, SimpleImage, SobelFilterImage, TransferOverlapCPP.

- And some don't run because they can't find the libGlew such as :

FluidSimulation, MandelBrot, NBody, SimpleGL, NoiseGL.

- And finally the samples where GPU is compulsory to run don't run, such as :

ImageBandWidth, BufferBandWidth, SimpleMultiDevice.

Others run just fine, except this error : FATAL: Module fglrx not found.

I know thatsomeone already explained it on the forum which will probably taken care of in the next release : 

But where is it? I'd like to hide it since it's kind of frightening for a client I suppose

Thanks again for your precious help, I think I'm making progress.

0 Likes

Hi,

Thats good amount of detail.

I don't understand why "BufferBandwidth" cannot work. If it sees an AMD GPU, it should work.

So, You have an x86 CPU + NVIDIA GPU combo?

Regarding the fatal error, you are right. It will be fixed in a subsequent driver release.

You can just wrap it up in a shell script as

"./yourApp 2>/dev/null"

I hope the library is printing to stderr... but doing this will also remove other messages to stderr.

OR may be, [./yourApp 2>&1 | grep -v -i "FATAL: Module fglrx" ] might help.

0 Likes

Hi again,

For the BufferBandWith the message error is :

Platform found : Advanced Micro Devices, Inc.

This sample requires a GPU to be present in order to execute

And it just stops like that. I haven't yet looked deep in the code but maybe i'll find some clues as why it doesn't find my GPU.

To be precise my combo is : Intel Xeon E5430 64bits and Nvidia GTX 650. I also added another GPU Nvidia and by command line I can switch my app on the device I want, which is really nice. I couldn't manage to use your command line to hide the error message. I have "ambiguous redirection", maybe because I have an argument for my program.

I'll look into that at least it gives me some ideas.

0 Likes

Hi,

It is quite possible that buffer-bandwidth is possibly looking for AMD devices. You need to look at the place where the "context" is created and on which platform.

It is quite possible that buffer bandwidth locates the AMD platform and tries to create context and AMD devices.

I will check the code sometime later.. meanwhile, if you can go through the code, you can find it out yourself.

0 Likes

Hi,

I looked into the code and I now understand why it didn't work.

  • First, it looks for an AMD platform with:

 if (!strcmp(platformName, "Advanced Micro Devices, Inc.")) 

  • An then tries to get the second device in the devices' list. But since I only have one device for AMD, this call fails.

 ret = clGetDeviceIDs( platform, devs[1], 128, devices, &num_devices );

    if((ret == CL_DEVICE_NOT_FOUND) || (num_devices == 0))

    {

        fprintf( stderr, "This sample requires a GPU to be present in order to execute");

        exit(0);

    }

0 Likes

Jacq Jay wrote:

Hi,

I looked into the code and I now understand why it didn't work.

  • First, it looks for an AMD platform with:

 if (!strcmp(platformName, "Advanced Micro Devices, Inc.")) 

APP SDK samples choose the AMD platoform (if present) or the default platform (platforms[0]).  So the sample should run on NVIDIA hardware.

An then tries to get the second device in the devices' list. But since I only have one device for AMD, this call fails.

 ret = clGetDeviceIDs( platform, devs[1], 128, devices, &num_devices );

    if((ret == CL_DEVICE_NOT_FOUND) || (num_devices == 0))

    {

        fprintf( stderr, "This sample requires a GPU to be present in order to execute");

        exit(0);

    }

devs is not the device number, but just an array of cl_device_type. devs[1] means CL_DEVICE_TYPE_GPU.

So at least these are not the real reasons for its failure. I will try to reproduce it , If i get a NV GPU at my disposal.

0 Likes

Yes, you're right ,sorry my mistake.

But since it first looks for an AMD platform and then looks for a GPU device for this platform, then it's normal that it fails, isn't it?

0 Likes

Well you are having AMD APP SDK installed, so AMD platform is selected (which only contains the intel CPU as a device).

Most of the samples can still be run using "-p 1" commandline option, but IIRC that option is not available for BufferBandwidth sample.

I guess as of now, you can just edit the code, and search for NVIDIA's platoform_vendor string.

0 Likes

Hi again,

I ran into another problem still regarding compatibility in my opinion. Actually, to make development easier I wanted to use the C++ wrapper and so include cl.hpp instead of cl.h. But I have many errors of this type :

test.cpp:(.text+0xd50): undefined reference to `clReleaseDevice'

I think the problem is coming from the conflict between OpenCl 1.2 from AMD and OpenCL 1.1 from Nvidia. But I don't know how to solve this. Should I keep using cl.h like before then?

Best regards,

Jacq

0 Likes

This is a known problem. Will be fixed by the Khronos group I believe...

Also, It is better if you use an AMD GPU for development.

0 Likes

Ok thanks a lot for you reply. Unfortunately working with AMD won't be possible since everybody is already working on Nvidia GPU. So for the moment, if I want to keep the same setup, I can't use the C++ wrapper right?

0 Likes

The latest C++ wrapper from the Khronos site should fix this issue. It is an annoyance with the ICD design that it cannot cope with missing functions in the underlying platform. We work around this by versioning individual devices directly in the reference counting wrapper in cl.hpp.

0 Likes

Hi,

I copied Khronos ' cl.hpp from their site (in section opencl 1.2 specification) in both CL folders ( AMD and NVIDIA ).

Well, it seems that the HPP include works fine for the moment, at least with my little program

Thanks for your advices.

Best regards,

Jacq

0 Likes

Hi hi,

Sorry to bother you,  but I'm really stuck with my program. I wanted to use the sum reduction kernel with OpenCL. The strange thing is that it's giving the proper result when I compute on GPU whereas it's completely wrong on CPU and even gives some corrupted memory error.. I have probably missed something important but I can't figure out what, since previous programs (doing vector addition with no shared data) were working fine on CPU and GPU. Could you give me some clues? I can post the code if needed.

Best regards,

Jacq

0 Likes

The most important thing is to check your synchronization primitives. You may have places where you forgot a barrier and luckily the code worked on the GPU because 64 work items are packed into a single vector thread and run synchronously. When you run it on the CPU it instead serializes that set of work items so the side effects will be seen in a different order. Check every point where you write to memory that is shared by the work items in a work group and see if you are making sure all other work items wait for that data to have been written.


Lee

0 Likes

Hi,

I checked and I put barrier(CLK_LOCAL_MEM_FENCE) where it was needed, following the sample example.

Here is the kernel code :

__kernel void OCLIntegrityTest_kernel(__global float *a_g_idata, __global float *a_g_odata)

{  

     __local float ocl_test_sdata[64];

    // perform first level of reduction,

    // reading from global memory, writing to shared memory

    const unsigned int tid = get_local_id(0);

    const unsigned int i = get_group_id(0)*(get_local_size(0)*2) + get_local_id(0);

    ocl_test_sdata[tid] = log(exp(sqrt(a_g_idata)))  +  log(exp(sqrt(a_g_idata[i+get_local_size(0)]))) ;

    barrier(CLK_LOCAL_MEM_FENCE);

    // do reduction in shared mem

    for(unsigned int s=get_local_size(0)/2; s>0; s>>=1)

    {

        if (tid < s)

        {

            ocl_test_sdata[tid] += ocl_test_sdata[tid + s];

        }

        barrier(CLK_LOCAL_MEM_FENCE);

    }

    // write result for this block to global mem

    if (tid == 0)

        a_g_odata[get_group_id(0)] = ocl_test_sdata[0];

}

And on the host, I create the OpenCL context and all necessary stuff. Finally from the output array, I sum the elements to have the final result. I think I have problems with dimensions, most probably with the enqueueNDRangeKernelFunction or maybe to read the output array ( the size might be wrong too ).  Here is what I used  :

cl::NDRange global(OCLINTEGRITY_NUMS);

cl::NDRange local(OCLINTEGRITY_WORK_ITEMS);

t_err = m_command_queue.enqueueNDRangeKernel(m_kernel, 0, global, local, NULL, NULL);

m_command_queue.enqueueReadBuffer(m_output_buffer, CL_TRUE, 0, OCLINTEGRITY_WORK_GROUPS * sizeof(float), m_h_output, NULL, NULL);

Where OCLINTEGRITY_NUMS = 1024 ( size of the input array),

and OCLINTEGRITY_WORK_GROUPS = OCLINTEGRITY_NUMS/(OCLINTEGRITY_WORK_ITEMS*2)

I'm still searching the answer but if anybody finds it obvious please give me little help.

Best regards,

Jacq

0 Likes

Your code looks very fine. I hope NUM Work Items is 64.

Can you confirm that?

0 Likes

Yes it's 64. Then I really don't know where it comes from

0 Likes

Hi,

Can you please upload zip file so that I can reproduce the problem here.

If we find this to be a problem with CPU Compiler or Runtime, we will work to fix the problem

Also, please give the configuration of your CPU

Is it from AMD or Intel? Model number, how many cores etc.. will help.

0 Likes

Hi,

I'm trying to use gDEbugger to find some clues.

Is it possible to send you the code in private?

You need sources and the system configuration right?

Best regards,

Jacq

0 Likes
ash
Journeyman III

Hi again,

I may have a clue to why it's not working but still confused. I had a class stocking some variables like that :

  cl::Device m_device;

    cl::Platform m_platform;

    cl::Kernel m_kernel;

    cl::CommandQueue m_command_queue;

    cl::Buffer m_output_buffer;

So that can I use them in different fonctions. In a first function "initOCL", I initilalise these values in that way :

m_command_queue = cl::CommandQueue ( context, m_device, 0, &err );

And then I reuse these variables in another function "runTest"  (where I call enqueueNDRangeKernel and enqueueReadBuffer).

When doing this i get memory corruption for AMD while running fine on NVIDIA as I said before.

But the strange thing is that when I call enqueueNDRangeKernel and enqueueReadBuffer in the first function ( everything in the same place), it's working fine for both platforms.

Can it be the the problem? I find it very weird. What do I do wrong? Seems like a silly mistake but can't get what I did in the wrong way. Hope somebody will be able to help.

Best regards,

Jacq

0 Likes

there is/was pitfall in C++ binding when you could create cl::Program without proper OpenCL context as it grabs some default context. it caused really weird error for me. so check all you function binding C++ call to their C counterparts if you pass all needed parameters. it is okay if C++ binding have default value for some parameters.

0 Likes

Please post the code as a zipped attachment.

0 Likes

Hi everybody,

I fixed the problem. That was indeed a really silly mistake. The input memory buffer and even the program were destroyed before calling the function enqueueNDRangerKernel to lauch the kernel. Weird that it didn't disturb nvidia though. Sorry for the bother, just a beginner's mistake, and thanks for your help.

Best regards,

Jacq

0 Likes

Good to know you fixed the problem. And, Thanks for letting us know about this. Good luck!

Other implementations might have got some hidden reference count and probably they did not destroy those objects... but its just a guess...

0 Likes

Yeah Thank you Himanshu ^ ^

I looked over the forum for some information on GDB and found some tutorials but I'm facing some problem. I can lauch gdb easily, put a breakpoint at clEnqueueNDRangeKernel, run,  and then I can put a breakpoint at the call at my kernel function (seems to be good).

But then when I continue, the program doesn't break on the call of the function and give me this warning instead:

warning temporarily disabling breakpoints for unloaded shared library

I spent sometimes looking for an explanation but still stuck. Do you have any idea by chance?

Regards,

Jacq

0 Likes

GDB?? Use CodeXL. Thats the preferred recommended tool from AMD.

It has GUI and rocks, works on Linux as well.

0 Likes

Yeah it looks really nice but I don't have an AMD GPU

Well I could try for the CPU maybe.

0 Likes
ash
Journeyman III

Could you tell me if codeXL can debug kernel even if I don't have AMD hardware? From the page product it seems for me that it only supports full AMD hardware, can you confirm?

If not, do you know where the GDB problem I exposed above, comes from?

Best regards,

Jacq

0 Likes
ash
Journeyman III

Hi all,

I installed CodeXL and I'm really disappointed. I can't do any debug. It's asking for an AMD GPU even though I want to run on CPU. I can't even watch my variables passing in the buffer. In the paper of CodeXL it was written any x64 CPU  then why is it blocking? Doesn't really rocks for me.

Best regards,

Jacq

0 Likes

It can help if you can give more information about your problem. Can you try this link http://samritmaity.wordpress.com/2009/11/20/debugging-opencl-program-with-gdb/ for running GDB.

Also I would recommmend you to start a discusion in CodeXL forum category for reporting forum category. Please mention the version you are using, and exact steps you performed.

0 Likes

Hi,

Ok thanks for the link I was looking at this one also . I'll try on CodeXL forum.

By the way I found an old GPU card,  it's an  ATI Radeon X850 XT. From my research it seems to not support OpenCL, can you confirm?

Best regards,

Jacq

0 Likes

By the way I found an old GPU card,  it's an  ATI Radeon X850 XT. From my research it seems to not support OpenCL, can you confirm?

The card does not support OpenCL. Please get atleast a HD 5xxx card, although the new GCN architecture (certainly recomended) is only available for 77xx,78xx & 79xx series.

0 Likes