Archives Discussions

eklund_n · ‎12-10-2010

Doing half the work on 2 GPUs takes as long as all work on 1 GPU

Hi.

I know that this has been mentioned before (http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=138807), but I have missed if a solution has been mentioned.

I have split the input array over both devices so that each card only does half the work compared to when I only used 1 GPU. The event profiling info also states the each card only work for half the time.

But the total time in host for running the kernel on the 2 devices isn't lowered. In fact it's marginally longer. Look at the attached code. For a certain input size the time is reported as 330 ms. If I remove all lines regarding command_queue[1]/device[1]/kernel[1], the time is around 160 ms.

I have tried "1 context-2 command_queues" and "2 contexts-2 command_queues" with pretty much the exact same results. Am I doing something wrong or is there a fix on the way?

Kind regards, Eklund

SYSTEM:
Ubuntu 10.04.1 64 bit, SDK 2.2, Catalyst 10.9
i7 950, 2x HD5870

... clFlush(command_queue[0]); clFlush(command_queue[1]); clFinish(command_queue[0]); clFinish(command_queue[1]); clock_gettime(CLOCK_REALTIME, &start); status = clEnqueueNDRangeKernel(command_queue[0], kernel[0], 1, NULL, &globalSize[0], &localSize, 0, NULL, &event[0]); status = clEnqueueNDRangeKernel(command_queue[1], kernel[1], 1, NULL, &globalSize[1], &localSize, 0, NULL, &event[1]); clFlush(command_queue[0]); clFlush(command_queue[1]); clFinish(command_queue[0]); clFinish(command_queue[1]); clock_gettime(CLOCK_REALTIME, &end); double time = (end.tv_sec-start.tv_sec)*1000.0+(end.tv_nsec-start.nsec)/1000000.0; printf("%f\n", time"); ...

zeland · ‎12-10-2010

The only solution is using Windows....

nou · ‎12-10-2010

hopefully this will be fixed in SDK 2.3 but before that did you try update driver to ltest catalyst 10.11?

eklund_n · ‎12-10-2010

will update to 10.11 next week when I'm back at work. Thanks for input.

also keeping fingers crossed for this working in 2.3, it's sort of a fundamental thing in OpenCL...

himanshu_gautam · ‎01-03-2011

eklund.n,

Any updates on this?

Archives Discussions

Serial execution on multiple devices