I have just installed SDK 2.2, and strangely the CAL IL compiler is behaving differently. I thought this was part of the drivers, and hence wouldn't change with the installation of a new OpenCL sdk?
Anyway, the first issue is one of my kernels that gives the following error during dissasembly (in SKA 1.6, or using CAL dissasembly routines):
Error: trans operation cannot take LDS source
t: MOV R3.y, QA.pop
This dissasembled without issue before I installed the new SDK.
The other issue is a pair of similar IL kernels which are now compiled to ASM using scratch registers. Before installing SDK 2.2 the compiled result used 18 GPRs, and now with SDK 2.2 they are twice as long, and using 12 scratch regs and 15 GPRs. What changed to make the compiler decide to spill these registers? Can I do anything to stop that?
Another curiosity is R123 which seems to keep popping up in many of the disassembly results from my IL kernels. I have also found it in the results from compiling an OpenCL kernel with SKA 1.6.
In all cases, the number of registers actually used in the kernel is much lower than 124...
Usually I find it used as the destination for a four arg instruction such as MULADD or CNDE_INT. In these cases it is never actually read from (PV?? or PS?? is used instead). The instruction should actually be encoded with ____ as the destination in these cases...
In the OpenCL case, the register is used several times as a source register without ever being written to... Strange huh?
Any ideas or explanations from AMD guys would be gratefully accepted.
It seems the scratch register spilling was caused by me changing the max thread per group. I've changed it to a lower value, and that seems to have solved that issue.
I was beginning to suspect this (after looking at the ASM docs and finding no special encoding for temporaries). So this seems to indicate a bug in the dissasembler then (low priority I'm sure), since it doesn't recognize them as temporaries?
Out of curiosity, how is a ____ destination encoded?
Also, any ideas about the 'trans operation cannot take LDS source' error? Is this just an issue with the dissasembler, or is it an actual restriction that the assembler is not obeying? I've managed to tweak my code so that it no longer occurs in my kernel, but it might be worth investigating for you guys if it is going to crop up again.
____ doesn't need to be encoded because it uses the "previous vector" (x y z or w) or the "previous scalar" operand codes in the following instruction bundle.