There's been a lot of focus on server design but there doesn't seem to be much attention toward GPUs (outside of clouds). I think with GPUs and particularly APUs it might be possible to handle the parallelism of requests more efficiently.
This would be for a high traffic server that handles a relatively small amount of data (able to fit entirely in RAM).
The biggest challenge I think would be handling the I/O aspect of requests since the GPU can't do this directly, but hre's how I think it could be possible.
Run an event driven Web Server like python's Twisted. Ideally one Instance of Twisted per core. All of the read-only data that the webserver is responsible for serving is stored in memory. When a read request comes in the event server would execute a kernel to handle it. This kernel could be maintained and executed with PyOpenCL (a nice synergy with Twisted).
After delegating to the kernel, Twisted could sleep/handle other requests and when the kernel is done processing the request it wakes Twisted with an event and Twisted sends the data back to the user.
When a write request comes in, twisted would just handle it; interacting with the database/whatever, then updating the Data in memory.
Basically Twisted handles all the writes and I/O and delegates the reads/calculations to the GPU.
The benefits would be the ability to handle many small requests easily using the many stream processors in a GPU/APU.
I think the real incentive would come from using an APU where sharing data is easier which leads to one question:
Can a massive amount of memory, i.e. basically all of it, be zero-copy when using an APU?
Down the road as fusion technology progresses, it seems like this setup could become worth even more consideration.
This is all basically just an idea about the potential of APUs so please leave any comments/concerns about persuing this kind of solution.