cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

corry
Adept III

Compiler Time Limits still present?

So, we have an IL kernel, tried doing some optimizations, and got slower speeds....took the code out entirely, got slower speeds..checked the ISA code....For some reason, taking code out/optimizing it to use less code resulted in scratch register usage!  Not only did it result in scratch register usage, it resulted in scratch register usage *all over the place*, I mean like scratch register texture blocks for every 5 or so lines of ALU code.  The IL code appears to use maybe 70-80 registers....to me this looks like the old compiler gave up optimizing and dumped evertthing to scratch regs.  Am I correct?  Is this still the behavior of the system?  I had thought people complained so incessantly about it aroung 11.4 that it was at least going to be removed...Was I mistaking?  I hadn't run in to this yet, so I assumed it was gone since I have some pretty awful compile times on some kernels aready.  Granted this one takes the cake (though its not that complex). 

0 Likes
1 Solution

We won't remove the pass that generate scratch register usage simply because doing so would not allow us to generate valid programs. The scratch register generation occurs after the program fails to compile with normal optimizations because of register usage being two large for the requested work group size. If you are using 70-80 registers, you should request a work-group size of 128 or smaller(preferably 64) so that it can be compiled without scratch usage.

View solution in original post

0 Likes
4 Replies

We won't remove the pass that generate scratch register usage simply because doing so would not allow us to generate valid programs. The scratch register generation occurs after the program fails to compile with normal optimizations because of register usage being two large for the requested work group size. If you are using 70-80 registers, you should request a work-group size of 128 or smaller(preferably 64) so that it can be compiled without scratch usage.

0 Likes

We're still looking into things, but couldn't you allow an offline compilation mode where it can use all the ram in the machine (x64 mode), and all the time it wants?  I know there comes a point where it gets ludicrous to continue, but some of the threads I read implied the kernels would be running for months+.  If one day was spent compiling a "final version" who cares!  I know CAL is deprecated, but perhaps whatever replaces CAL?  Perhaps it will be moot with GCN?  If only we were targeting GCN

0 Likes

This is something we are looking into, though I can't give time frames on when it will be available.

Note that on GCN, you can't avoid spilling by reducing the workgroup size.  With 256 threads per workgroup, you have just as many registers available (256 scalar registers) as if you had 64 threads per workgroup.  See the APP SDK documentation for more info.

Jeff