Aside from the fact that this is more an idea rather than a question, I have thought of the same thing (and believe I have posted it earlier, but Jive is not that friendly when it comes to finding very old posts, even those made by me).
I was looking for a good OpenCL devbox with performance enough for gaming as well, but keeping the price under 1000€. Also, having had an ASUS G73JH notebook, I grew fond of the dual cooling solution notebooks, and I would simply kill for a fully AMD based variant in a smaller form factor, such as the ASUS G46VW or the Lenovo Y40 (both 14" dual cooled laptop without optical drive (finally, a total waste of space nowdays)).
The dual cooling solution offers itself for a totally symmetric design, and as th3r0ck03 has mentioned, dual-socket APU boards could help AMD make up for the lack in CPU power (although such design could only come into fruition by the time Zen arrives, which we know nothing of yet). 2 FX-7500 or 2 FX-7600P APUs with their total of 8 cores and 12-16 GCN compute units, would make a killer gaming rig and dev box. The APUs need be interonnected with some capable (presumably new) version of HyperTransport, that would be equally fast as the PCI-E 3.0 based new CrossFire technology (forgot the name). Having 2 sockets would help eliminate the memory bandwidth bottleneck due to having 4 instead of only 2 memory controllers, although some of the gain would certainly be lost on the interconnect.
The reusability of such design would the multi-socket server boards, which eventually will have to arrive. We would not expect the 2 GPUs two be displayed as one, although it might seem reasonable to see 2 CPU devices in OpenCL in a multi-socket environment, even if the OS treats them as one (in some sense).
The only problem I see, is that even if this idea would make it to the right people, we would never get any feedback on why AMD will not make such a design (because it sounds too good to be true). I am no engineer, so I cannot tell what the difficulties are. Furthermore, this idea might be somewhat late with stacked RAM just around the corner, which will provide bandwidth far greater than any HT interconnect yet. 2 APUs would have a hard time communicating with each other in a speed roughly 2 magnitudes slower, than raw stacked RAM speeds.
I'm sure we will see dual APU board sooner rather than later. Since I am in a speculation mode (so we went from a question to an idea to a speculation in three posts ), I'll speculate on what AMD is cooking up. I think AMD is working on APU based Opteron that uses Hyper-Transport interconnect. Not only will this allow building dual socket board, AMD will be able to build 64 APU shared memory machine in 10U chassis (Seamicro anyone?). AMD's Hyper-Transport technology can already connect hundreds of CPUs to build shared memory machine with Terabytes of memory (Home - NumaQ, NumaConnect Adapter N313 for HyperTransport HTXhttp://www.numaq.com/). They have had issue with getting CPU and GPU to play nice together (Kaveri was supposed to be the first HSA chip. Now they are saying that Carrizo will be the first chip with full HSA support). But they seem to have mostly solved the issues. I have seen paper discussing cache coherency issues between CPU and GPU (http://research.cs.wisc.edu/multifacet/papers/micro13_hsc.pdf) and Jim claimed that AMD will have an SOC that has right plumbing and fabric in 2015 (AMD Core Innovation Update Press Conference - YouTube). If AMD uses HBM as cache and NumaConnect type Hyper-Transport fabric to interconnect large number of APUs in a shared memory machine, it will be a killer system for a data center for all types of workloads. AMD has all the necessary technology to build a truly amazing system. I hope they can execute and bring out a system on time this time (AMD shows off Carrizo, declares it on time and coming in first half of 2015 | ExtremeTech) instead of just declare that it is on time.