ALU packing (evergreen)

Discussion created by frankas on Dec 26, 2009
Latest reply on Jan 4, 2010 by empty_knapsack

I was pleased to see that the Evergreen instruction set was published with the 2.0 release. But try as I may, I can't find any documents with information on how to optimally pack the ALU instructions.

For instance, I assume that the cosine instruction can only be issued in the t unit, although this is not stated. Whats more, the IL specification talks about the cosine instruction operating on a vector (xyzw of a register) - which seems to conflict with the microcode operating on a single 32 bit register.

The kind of ducumentation I am looking for would be:

How many and which instructions can be coissued in a VLIW.

Which instructions are only legal in the xyzw units

Which intructions are only legal in the t unit.

Which instructions can be issued to any unit.


In short information needed to get fuller utilization of the stream cores in the ALU clauses. Currently my kernels very often use 4 or less out of 5 units ( <80%) - even when there is no data dependancy, and I am trying to understand which changes I can make to get closer to 100% utilization.


Any pointers will be much appreciated.