I've written an OpenCL application which runs much faster in Linux if one calls kernels concurrently on R9 290(X) devices because only one of the three kernels has high register usage. The performance of GCN-based devices (excluding Hawaii) scales very well with CU count and GPU core frequency in Windows and Linux. That's my problem and my question:
- Hawaii devices have a significant performance drop in Windows compared to Linux (about 1/3rd of the performance is lost) because it seems impossible to execute kernels in parallel on the device. Is this caused by different feature sets of the Windows and Linux Catalyst driver?
- If a monitor is attached to the GPU and kernels are called concurrently in Windows, then high CPU usage is observed (any GCN-based device). The CPU does nothing else but calling OpenCL kernels and reading/writing a few bytes to device memory. Is this a driver bug?
with best regards,