I read that GPU and CPU will now be able to access the same uniform address space. But some things are kind of unclear:
1- Which options should be used when creating buffers on Kaveri to exploit this? or is it enough to use map/unmap when accessing the data?
2- Will AMDs OpenCL implementation allow allocating very large buffers on Kaveri? If we have 32GB RAM on the machine, Will you be able to allocate some 20GB sized buffer objects?
Nobody knows the answer to this question yet? or ? I am willing to accept best guesses
1. Documentation on how to create the buffers should be available when the drivers support it. Please stay tuned.
Kaveri drivers are avaiable can you provide some update on documentation?
Kaveri HSA enabled drivers are likely not going to be available until the 2nd quarter (likely June). I believe they will just be beta drivers at that time. It is Q1 2015, concurrent with Carrizo that we will get release drivers.
OpenCL: At the time of launch, Kaveri will be shipping with OpenCL 1.2 implementation. My understanding is that the launch drivers are not providing HSA execution stack and the OpenCL functionality is built on top of their legacy graphics stack built on top of AMDIL. In Q2 2014, a preview driver providing OpenCL 1.2 with some unified memory extensions from OpenCL 2.0 built on top of HSA infrastructure should be released. A driver with support for OpenCL 2.0 built on top of HSA infrastructure is expected in Q1 2015.
As far as my understanding goes, you have specify appropriate flags while creating buffer and the runtime takes care of the rest.
Its preferred to use CL_MEM_ALLOC_HOST_PTR since runtime has complete control over memory and flags to create it. Check this Interaction policies with Main Memory for different kinds of memory that can be allocated at cpu side. If you use CL_MEM_USE_HOST_PTR, memory is already created with default flags and there is little scope for optimization
If you use CL_MEM_ALLOC_HOST_PTR with proper hints like CL_MEM_HOST_WRITE_ONLY or CL_MEM_HOST_READ_ONLY, runtime has enough options to optimize memory for your purpose. Map/Unmap must be used for host to access/populate this buffer. Idea is to create opencl buffers and use the same on device and host, instead of creating host heap memory and copy to/fro.
HSA extensions provided in kaveri might be straight forward and easy to use. We have to wait for the documentation
I am not too sure if AHP would be needed. If Kaveri can make available virtual-addresses at the disposal of GPUs (thats what pure HSA has to do - but then good things happen slowly...) then your best bet is probably UHP. (Use Host Ptr). This will do well on HSA platforms....And can give you bad bad performance on non-HSA platforms.
Any other buffer allocation scheme AHP, No Flags, Map/Memcpy/Unmap schemes will involve an extra-copy that defeats the purpose of HSA.
This suggests me something......okay. I wont tell. I am gonna work on it.
- Bruhaspati & Koshpati
On the FM2+ BIOSs they allow a UMA Frame Buffer Size between 32MB and 2GB.
Does the BIOS UMA Frame Buffer Size affect the available memory in openCL?
After HSA drivers ship, what function will the BIOS UMA Frame Buffer Size serve?
Why does the UMA Frame Buffer Size stop at 2GB? is this a soft limit set by the bios manufacturer?
In theory could it be increase to say 16GB on a 32GB machine?