I'm trying to compile a rather large kernel and being give the following error after 20-40 sec of kernel compile time, both in the runtime as well as under CodeXL:
Shader compiler had memory allocation problem
Error: HSAIL program is not finalized successfully.
Codegen phase failed compilation.
Error: BRIG finalization to ISA failed.
========== Build completed for 1 devices: 0 succeeded, 1 failed. ==========
I've attached a simplified version of the code, but it cannot be reproduced just with this simplification. If needed I can attach the full code. I've traced down the problem to the slide function, and think this pushes the kernel complexity over some limit, from the AMD CL compiler perspective.
What I found is that if I just remove a bit of the slide function complexity (e.g. remove most inner for or remove break), the code will compile (though still 15+ sec). How can I go about this problem ?
This is happening both under Windows as well as Linux runtime. Also for other platforms (Nvidia and Intel), code is compiling and working fine.