Is there any way to get into direct contact to AMD engineers? The OpenCL driver bugs and inconsistencies are quite annoying.
Thanks for your question. I have reported this to the team, and will post an update as soon as I get one.
Would it be possible for you to share the code you have written? I would like to verify it at my end.
Could you share the host and device code?
I do not want to publish the device code. But I can provide a link to the LLVMIR binary or to native binaries.
I can do an update to my initial post:
- If multiple threads are run on one GPU, then they execute the same task on completely different data sets (e.g. buffers).
- The high CPU usage bug seen on R9 290(X) does not show up with Catalyst 14.x beta drivers, but e.g. with Catalyst 13.12 it does show up.
- Some information about my kernels and execution times on R9 280X:
a) 1st kernel (mainly limited by compute resources, but also has high LDS and register usage): ~0.045s
b) 2nd kernel (high LDS usage): ~0.010s
c) 3rd kernel (high LDS usage): ~0.014s
Average execution time of all three kernels is ~0.046s with two threads per GPU. Therefore, the execution of the 1st kernel has to overlap with the 2nd and 3rd kernel. With Windows and a R9 290 the average execution time is the sum of the execution times of the kernels even with two threads per GPU. On Linux a R9 290 is more than 25% faster than on Windows, but if the kernels would overlap like on a R9 280X, I would expect better performance. Perhaps only the execution of the 1st and 2nd kernel overlaps.