I'm looking at the (Cayman and Cypress) ISA for one of my kernels, Inside of a loop where I have a single memory read (which I can easily identify) near the beginning. Then there are a bunch of additional TEX clauses with VFETCH instructions appearing which I don't understand.
There are reads from __constant buffers, which as far as I understand appear as an ALU clause locking some range in a CB (with something like KCACHE0(CB5:0-15) appearing at the beginning of the clause, with sources used such as KC0[2].z).
What are these fetches? What's the significance of the fc130, and FETCH_TYPE(NO_INDEX_OFFSET)? How can I prevent these from happening, both to avoid the clause changes and the possible extra fetches that appear?
// earlier read which I do understand 08 TEX: ADDR(5780) CNT(1) 41 VFETCH R13, R0.w, fc175 FORMAT(32_32_32_32_FLOAT) FETCH_TYPE(NO_INDEX_OFFSET) // later fetches which I don't understand look like this 13 TEX: ADDR(5782) CNT(1) 150 VFETCH R1, R1.z, fc130 FETCH_TYPE(NO_INDEX_OFFSET) // On Cypress they look like this 12 TEX: ADDR(5798) CNT(1) 146 VFETCH R2, R2.w, fc130 MEGA(16) FETCH_TYPE(NO_INDEX_OFFSET)
I don't think I should have any dynamic indexing here. The closest thing I have to dynamic indexing is in an unrolled loop. The number of these clauses actually looks consistent with the number of unrollings * number of uses. Is this an optimization not applied to unrolled loops? Alternatively, could this have anything to do with reading a from a constant array of structs?
What I have looks like this:
__constant SomeStruct* a (a kernel argument) for (...) { #pragma unroll N for (i .. N) { // operations involving a.structfield1, a.structfield2 } }
I figured it out. The fetches were inside the default exp() function.