My problem is the same as Slow SPIR
But I think that my problem is with VGPRs usage. Under SPIR they are used much more. How can I investigate to workaround this behaviour? I can provide executable with OpenCL and executable with SPIR. But I wonder, if the twice slower SPIR performance for so many years wasn't investigated? Because I don't think that pure OpenCL without SPIR can be used in commercial apps.