A rather open ended question.
16 Opteron cores at 2.5GHz would be 320GFLOPs single, 160 double peak throughput (off the top of my head, may have missed something). The 6970 would be 2700 and 700 accordingly. So in theory it's a factor of about 4 peak for double. Memory bandwidth, two sockets, something like 30 or 40GBps compared with 170. So maybe factor of 5 there. So somewhere around 5x performance for the GPU at peak compared with the CPU assuming you use OpenCL vectors throughout and efficiently use memory.
Randomly access memory but constantly hit in the CPU cache and the CPU could easily win. Include transfer time to the GPU in the calculation and the difference is less significant still.
On the other hand, adding the GPU to a 16 core Opteron server is extra compute power in the same box. You could add multiple GPUs too. If that's 16 opteron cores in a *single* socket then the clock speed and memory bandwidth would drop.
The reality is it depends on the code you're trying to run and how good you are at vectorising your algorithm.