Kaveri has arrived, and as we all know there is some sharing of resources in the architecture of Kaveri and their predecessors. For one, two cores essentially share one floating point pipeline. I am not sure if this was done to "safe" space or not or if part of the thinking was that the GPU could do some of those floating point operations. If the latter was part of the thinking process, I am then wondering how one could instruct the GPU to do these floating point operations even if they are just a bunch of one-offs. I should say that I am aware of OpenCL 1.2. OpenCl 1.2 is not an option because:
- Much more overhead would be created spawning off a task, copying the data to the GPU, collecting the data and then closing off the task.
Is there currently a low-overhead way to say "GPU you take this task"? Kaveri, afterall, now has the same access to the memory as the CPU.
The other thing I am concerned about is how I can do such a thing and still keep the software universally running on non HSA enabled hardware?