I'd like to surface this question once again.
Is a compiler feature to specify maximum register usage by a kernel, similar to maxregcount for cuda, planned for a future release?
Currently we are investigating whether OpenCL is worth looking at for datacenter compute scenarios VS cuda and this came up as the worst blocker. For more advanced kernels it becomes beneficial to spill or recompute data in registers to increase occupancy.
If maxregcount is not used with CUDA, then openCL port and cuda implementation perform about the same and use about 35% of available Tflop/s. When maxregcount is specified to allow 100% occupancy for NVidia card, the kernel is able to use 85% of available compute. While one may try to write more optimized code, it's hard to do due to lack of feedback on register usage by different parts of kernel on AMD and it's not something we would spend time on unless we have to.
Is it possible to get an exact answer from an insider if this feature is planned and if we can get a beta drop in near future, before we have to finalize technology decision?