And also certain OpenCL kernel optimizations like:
In order of importance:
- Reduce kernel launch times
- Remove X server requirement
- Make global atomics not force the complete path for every global memory access in the kernel
Another feature I would really like to have:
clGetDeviceInfo should return the GPU name if invoked with CL_DEVICE_NAME instead of the chipset name.
The chipset name makes it impossible for endusers to determine whether the application is using the device it's supposed to be using. In addition the current implementation is buggy anyhow, e.g. returning Cypress on the 5970, where I think it should be Hamlock. This makes it difficult for the developer to estimate the device performance (especially as the MAX_CLOCK also doesn't report correct values).
this is not quite directly software related, but might be useful to a number of people if it were made possible:
some kind of web-based service allowing the testing of opencl code on amd hardware.
As an example, I have easy access to reasonable nvidia cards, as they come on all of my laboratory's workstations. Having developed a simulation code in OpenCL, I would like to be able to test it on radeon/firestream cards, but am not prepared to go out and buy a card without having an idea of performance my code could reach.
Sorry if this is slightly off topic!
"Increase buffer size limits (so that we won't have to rely on experimental environment variables like GPU_INITIAL_HEAP_SIZE and GPU_MAX_HEAP_SIZE)"
I think I already had highly varying performance values between application runs because of this.
Especially as it's already there for Windows: Zero-Copy on Linux! That should really speed up my Halo-Exchange, finally making my code work with more than one computer.