cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

richi
Journeyman III

Using Multiple GPUs in a OpenCL program

I have a PC with 2 R4870 video cards running in Linux. Using pyOpenCL, I can run a program in either GPU, but when I try to run 2 simultaneous kernels (one in each card), it seems that in order to the second queued kernel to run, the first one queued must be finished. I'm expecting that I can queue 2 instances of the same kernel, one in each GPU, and that the total running time should be roughly  the same as if I run only one instance, but this is not happening. I have tried using a clFlush after queuing each kernel, but the running time still the same.

Is it possible to use both (multiple) GPUS simultaneously in OpenCL? How can this be done?

0 Likes
24 Replies
n0thing
Journeyman III

Right now multiple-GPU or CPU+GPU doesn't work on AMD's implementation.

It just returns a CL_COMPLETE execution status after clFlush command.

0 Likes

i do not think so. smallluxGPU work well on multi-GPU and CPU+GPU. but it use one context per device.

0 Likes
davibu
Journeyman III

Originally posted by: nou i do not think so. smallluxGPU work well on multi-GPU and CPU+GPU. but it use one context per device.

Yup, and I use 1 thread per GPU too. So 1 thread, 1 context, 1 queue for each GPU. I tried other configurations but they weren't working (i.e. not running in parallel).

BTW, it looks like it works well only under Linux because I'm experiencing horrible performance under Window 7 64bit with multiple GPUs (but this could be related to some thread/mutex issue and not to the OpenCL driver, I'm still investigating the problem).

 

 

0 Likes

I can only say that at CAL level (and obviously OpenCL built upon CAL) there are numerous problems with multiple GPUs.

Definitely you're need one thread and one context per each GPU to make it working. But it itsn't enough because almost every CAL function isn't thread safe, thus calling calResMap() (which is the only to get access to local GPU memory) in one thread blocks all other threads/contexts.

And (as I've already wrote at these forums), OpenCL using calCtxWaitForEvent() function instead of CPU burning loop

while (calCtxIsEventDone(calCtx, e) == CAL_RESULT_PENDING);

to wait for GPU kernel completion.

But this calCtxWaitForEvent() also blocks every context currently running. This especially noticeable when there are different devices at system (like 5770+4770). So basically it's simply impossible to asynchronously work with multiple GPUs within single process.

 

All above things applies to windows version of CAL, never tried linux one.

0 Likes

For my education, is it currently impossible to use multiple GPUs or multiple graphics cards?

In other words, what about a single Radeon 5970 (a dual-GPU card)?

0 Likes

well using multiple GPU is possible. but there is issue with crossfire. if you have crossfire enabled then second GPU return incorrect results. but with 5970 you can not disable crossfire so you can use only first GPU. this shoul fix next driver or SDK.

0 Likes
alexg
Journeyman III

Originally posted by: nou well using multiple GPU is possible. but there is issue with crossfire. if you have crossfire enabled then second GPU return incorrect results. but with 5970 you can not disable crossfire so you can use only first GPU. this shoul fix next driver or SDK.

 

If I understood the previous discussion correctly, one would need to run multiple CPU threads to use multiple GPUs, but the SDK is not thread-safe, so this is not really an option.

0 Likes

While I'm not familiar with ATI systems too much, with NVIDIA hardware, SLI must be disabled to utilize mutliple GPUs.  I would assume that for ATI, CrossFire would need to be disabled too.

As far as doing multiGPU, it is quite capable and there is a wonderful example in the NVIDIA SDK.  I have successfully implemented my own version and tested it across various devices.  The basic concept is to:

1. Find all compatible devices

2. For each device, create a command queue on the same context

3. Allocate work to each queue individually

I have this successfully running with NVIDIA cards, however what brought me back to the ATI forum is that my program crashes when using the ATI SDK and driver.  ATI may not currently support multiGPU.

0 Likes

achinda99: That does work in principle on the AMD implementation, but it seems that AMD's OpenCL is "lazy", so unless you call a blocking command on the queue (flush/finish) it won't do much/anything. The only way to be sure seems to be to launch two host threads and call flush or finish from each of them to the respective command queue.

This seems like it's unnecessarily complicated (and apparently unsafe according to other posts), so I hope AMD will make their implementation less lazy in the future, so that the command queue get's to work without having to call a blocking command on it.

Please correct me if I'm wrong anyone, because I hope I am

0 Likes

I created a thread per device, i.e. each thread creates its own context and command queue. I had to serialize this initialization phase using a critical section, but after that I had no problems executing all threads in parallel and I was able to execute on multiple GPUs in parallel.

0 Likes

Originally posted by: achinda99 While I'm not familiar with ATI systems too much, with NVIDIA hardware, SLI must be disabled to utilize mutliple GPUs.  I would assume that for ATI, CrossFire would need to be disabled too.

 

As far as doing multiGPU, it is quite capable and there is a wonderful example in the NVIDIA SDK.  I have successfully implemented my own version and tested it across various devices.  The basic concept is to:

 

1. Find all compatible devices

 

2. For each device, create a command queue on the same context

 

3. Allocate work to each queue individually

 

I have this successfully running with NVIDIA cards, however what brought me back to the ATI forum is that my program crashes when using the ATI SDK and driver.  ATI may not currently support multiGPU.

 

Achinda99,

                Please provide a test case or your code to reproduce this issue. Please also provide your system details like OS, GPU, Driver version, SDK Version.

0 Likes

I narrowed down the problem I have to writing a 2D image from the host to the device for which I created another thread.  On this thread, I was merely commenting that what brought me back to the forum was trouble running my code on ATI devices.  In response to the ongoing thread, I was just commenting that maybe the current OpenCL implementation in the Stream SDK doesn't support multiGPU, which appears to be incorrect as someone pulled it off.

0 Likes

Any isse or SDK fix to realy support multiple GPU?

(without creating separate contexts, queues, threads... and run them in parallel???)

Any examples?

Will be running on si-28 embedded platform with amd cpu + 2 onboard ati E490 + HD3200  chipset 780E so no way i can dissable/enable crossfire...

0 Likes
ebfe
Journeyman III

Pyrit also uses seperate contexts and queues for all GPUs. Its a bug in AMD's implementation.

Also see http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=128846&enterthread=y

0 Likes

ok developer of smallluxgpu reported that second GPU return wrong result when is crossfire enabled. try disable crossfire maybe it will help.

0 Likes
richi
Journeyman III

I have tried using two context and two queue (one per GPU), also using one context and two queues (also one per GPU), but the result is all ways the same, the kernels are not executed simultaneously. Is there any example of how this could be done? I'm using pyOpenCL, can this be a pyOpenCL bug? Or maybe I have a configuration problem in my PC? Is there any example on how multiGPU servers are being used?

0 Likes

nou, I dont have crossfire enabled.

0 Likes

well there is maybe problem with that OpenCL is quite lazy. if you queue some work it did not begin execute. so IMHO it works like this

enqueu first GPU
enqueu second GPU
clFinish(queue1) //begin execution on first GPU. second is lazy and did not execute.
clFinish(queue2)//now it begin execute on GPU two

so try after enqueu call clFlush() (after that it should begin execute but not block calling thread) or better call it from different threads.

 

0 Likes

OP has already tried clFlush and gets the same results as before. The command is currently blocking the calling thread as I mentioned above.

0 Likes

oh. so then you must use multiple threads.

0 Likes

dmarchet,

You can have single context, but will require seperate command queues to use multiple devices(OpenCL spec)

 

0 Likes

ok, thanks for responding so quick Omkaranathan;

Does anyone have a code or sample working on cpu+gpu or 2 gpu

0 Likes

Do I take it correctly, that I can use multiple GPU-s in a single OpenCL program, if I create a single context (with multiple devices) and then creating a command queue for each device seperately? This will be working, but has absolutely no point, cause the command queues block each other unless called from different threads.

Q1: What easy ways are there to create seperate threads under linux other than Boost library? (something standard pls)

Q2: Does AMD plan on making StreamSDK (CAL to be precise) threadsafe in a manner, that the OpenCL specification can be used to it's fullest with multiGPU support in the near future?

I don't have all the time in the world to create a neat framework to use multiple GPUs. I'd hate to kill time into it and than have it implemented properly in the upcoming SDK. Best would be if it would work from a single thread (as it should be).

0 Likes

Originally posted by: dmarchet ok, thanks for responding so quick Omkaranathan;

 

Does anyone have a code or sample working on cpu+gpu or 2 gpu

 

You can expect a sample on this in upcoming release.

0 Likes