Hi.
I know that this has been mentioned before (http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=138807), but I have missed if a solution has been mentioned.
I have split the input array over both devices so that each card only does half the work compared to when I only used 1 GPU. The event profiling info also states the each card only work for half the time.
But the total time in host for running the kernel on the 2 devices isn't lowered. In fact it's marginally longer. Look at the attached code. For a certain input size the time is reported as 330 ms. If I remove all lines regarding command_queue[1]/device[1]/kernel[1], the time is around 160 ms.
I have tried "1 context-2 command_queues" and "2 contexts-2 command_queues" with pretty much the exact same results. Am I doing something wrong or is there a fix on the way?
Kind regards, Eklund
SYSTEM:
Ubuntu 10.04.1 64 bit, SDK 2.2, Catalyst 10.9
i7 950, 2x HD5870
... clFlush(command_queue[0]); clFlush(command_queue[1]); clFinish(command_queue[0]); clFinish(command_queue[1]); clock_gettime(CLOCK_REALTIME, &start); status = clEnqueueNDRangeKernel(command_queue[0], kernel[0], 1, NULL, &globalSize[0], &localSize, 0, NULL, &event[0]); status = clEnqueueNDRangeKernel(command_queue[1], kernel[1], 1, NULL, &globalSize[1], &localSize, 0, NULL, &event[1]); clFlush(command_queue[0]); clFlush(command_queue[1]); clFinish(command_queue[0]); clFinish(command_queue[1]); clock_gettime(CLOCK_REALTIME, &end); double time = (end.tv_sec-start.tv_sec)*1000.0+(end.tv_nsec-start.nsec)/1000000.0; printf("%f\n", time"); ...
The only solution is using Windows....
hopefully this will be fixed in SDK 2.3 but before that did you try update driver to ltest catalyst 10.11?
will update to 10.11 next week when I'm back at work. Thanks for input.
also keeping fingers crossed for this working in 2.3, it's sort of a fundamental thing in OpenCL...
eklund.n,
Any updates on this?