cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

ThomasUCF
Journeyman III

about the "simpleMultiDevice" sample

Device level parallel problem

  Hi all:

      In the SDK samples, the "simpleMuitiDevice" uses multiple GPUs, however, according to my test, it still completes the calculation in the first GPU before the second begins to work. So it is still no better than a single GPU. My question is : How to make GPU[0] and GPU[1] work in paralel ?

 Thanks a lot.

 ThomasUCF

0 Likes
8 Replies
nou
Exemplar

try set enviroment variable GPU_USE_SYNC_OBJECT=1

0 Likes

  Is it the "enviroment variable" in the properties when I right click "my computer", I tried but it doesn't help much.

 Also, in the "simpleMultiDevice" sample, there are "single thread" and "multiple thread" cases, does "single thread" means the execution is one after another between GPUs and "multiple thread" means they can be in parallel?

  I really appreciate any insightful comments on this.

ThomasUCF

0 Likes

oh sorry i forgot S at the end. so GPU_USE_SYNC_OBJECTS

and single/multi thread in that sample mean that how much threads are used to manage queues. single thread for all queues or one thread per queue.

0 Likes

The serializations or maybe partial serializations in simplemultidevice is a known issue. I am not sure how reliable the env var is.

ThomasUCF,

please post your system info.(IIRC you had a 5970). Have you changed your card?

0 Likes

Hi Himanshu:

    My system is i7, and two 5970s, and win7. Are there any other examples that uses multiple GPUs? I followed the "simpleMultipleDevice " and wrote my own application, but one GPU will wait for another before it starts to work, so the speed doesn't seems to increase, change the enviroment variable didn't seem to help.  What can I try now?

    Thanks a lot.

  ThomasUCF

0 Likes

 

   Right now I can only develop my application based on the samples so I really need a good sample (example) that use the multiple GPUs and they also need to be running in parallel.

   Thanks again.

   ThomasUCF

0 Likes

thomas do you tryed with correct variable GPU_USE_SYNC_OBJECTS? thought it may work only on linux. and call clFlush() before clFinish() as OpenCL is "lazy" and do not start job until you call clFlush(). at least on AMD implementation.

0 Likes

 

  Hi now:

        I tried correct "GPU_USE_SYNC_OBJECTS" in windows, it didn't work.

 I called clFlush() before  calling clFinish(), clFlush() took only a little time to return, but clFinish() takes long. I'll do more test, but as I see it now, after clFlush() is returned, the device does not start to work, we must call clFinish() because the running time of two clFinish() is twice the running time of one, meaning that the running is serialized.

    Thanks a lot.

  Thomas

0 Likes