Forgive my ignorance as I know very little about graphics programming and OpenCL. I have an OpenCL raytracer that distributes work to multiple GPUs. This is just a proof of concept to show you CAN do it using a high level ParallelFor loop I added to my clUtil library, so it doesn't do anything fancy like KD-trees or a BVH. In the simple scene I'm rendering (with 3 Radeon 7970s) with about 8 spheres, a plane, I get up to 200 million rays per second. If I don't transfer the partially rendered bit back to the CPU to aggregate them into a single bitmap, I get around 350 million rays per second. I'm just curious, it sounds like this is exactly what Crossfire/SLI is designed for: distribute work to multiple GPUs and aggregate the results for each frame.
Given that OpenCL and OpenGL have interoperability, would it be possible to distribute work to render part of each frame on each device and aggregate the results using Crossfire? It sounds to me that's what it's designed for.