Forgive my ignorance as I know very little about graphics programming and OpenCL. I have an OpenCL raytracer that distributes work to multiple GPUs. This is just a proof of concept to show you CAN do it using a high level ParallelFor loop I added to my clUtil library, so it doesn't do anything fancy like KD-trees or a BVH. In the simple scene I'm rendering (with 3 Radeon 7970s) with about 8 spheres, a plane, I get up to 200 million rays per second. If I don't transfer the partially rendered bit back to the CPU to aggregate them into a single bitmap, I get around 350 million rays per second. I'm just curious, it sounds like this is exactly what Crossfire/SLI is designed for: distribute work to multiple GPUs and aggregate the results for each frame.
Given that OpenCL and OpenGL have interoperability, would it be possible to distribute work to render part of each frame on each device and aggregate the results using Crossfire? It sounds to me that's what it's designed for.
Usually, if I have three gpus, I'll use two of them to render using crossfire, and the last one will be used to do Opencl calculation. So it's possible to distribute work to render part of each frame on each device and aggregate the results using crossfire.
I beg to differ,
In most cases Cross Fire works by distributing whole frames. E.g. in a two GPUs scenario all even frames will be rendered on GPU 0 and all odd numbered frames will be rendered on GPU 1 .
Currently we do not support OGL interop with multi-device contexts.
However, you do not need OGL to maximize parallelism.
You need to remove CPU-GPU synchronizations. Be sure to en-queue and flush the NDrange commands of frame N+1 before waiting for the results of frame N .
You can profile GPU utilization by using AMD APP Profiler or Microsoft GPUView.