Both scatter_IL and scratch_buffer_IL examples work fine. However, combining these two features together seems problematic.
The testing IL kernel is:
il_ps_2_0
dcl_indexed_temp_array x0[2]
dcl_input vObjIndex0
mov x0[vObjIndex0.x], 1
mov r0, x0[0]
mov g[0], r0
endmain
I am using SKA1.1 and CAL 9.1 to compile it into RV770 assembly. It reads:
; -------- Disassembly --------------------
00 ALU: ADDR(32) CNT(4)
0 x: MOV R1.x, R1.x
y: MOV R1.y, R1.y
z: MOV R1.z, R1.z
w: MOV R1.w, R1.w
01 ALU: ADDR(36) CNT(5)
1 x: MOVA_INT ____, R0.x
2 x: MOV R4[A0.x].x, R1.x
y: MOV R4[A0.x].y, R1.y
z: MOV R4[A0.x].z, R1.z
w: MOV R4[A0.x].w, R1.w
02 ALU: ADDR(41) CNT(13)
3 x: MOVA_INT ____, 0.0f
4 x: MOV R0.x, R4[A0.x].x
y: MOV R0.y, R4[A0.x].y
z: MOV R0.z, R4[A0.x].z
w: MOV R0.w, R4[A0.x].w
5 x: MOV R4.x, R0.x
y: MOV R4.y, R0.y
z: MOV R4.z, R0.z
w: MOV R4.w, R0.w
6 x: MOV R0.x, 0.0f
y: MOV R0.y, 0.0f
z: MOV R0.z, 0.0f
w: MOV R0.w, 0.0f
03 EXP_DONE: PIX0, R0
END_OF_PROGRAM
; -------- End of Disassembly --------------------
It seems x0[] and g[] become identical, and the kernel contains no MEM_EXPORT_WRITE operation, so that it will not write the global buffer.
However, changing all x0[] into x1[] (including declaration) in the IL kernel solves the problem. Now it reads:
; -------- Disassembly --------------------
00 ALU: ADDR(32) CNT(4)
0 x: MOV R1.x, R1.x
y: MOV R1.y, R1.y
z: MOV R1.z, R1.z
w: MOV R1.w, R1.w
01 MEM_SCRATCH_WRITE_IND_ACK: VEC_PTR[0+R0.x], R1, ARRAY_SIZE(1) ELEM_SIZE(3)
02 WAIT_ACK: Outstanding_acks <= 0
03 VTX: ADDR(48) CNT(1)
1 RD_SCRATCH R0, VEC_PTR[0], ARRAY_SIZE(1) ELEM_SIZE(3) UNCACHED BURST_CNT(0)
04 MEM_EXPORT_WRITE: DWORD_PTR[0], R0, ELEM_SIZE(3)
05 ALU: ADDR(36) CNT(4)
2 x: MOV R0.x, 0.0f
y: MOV R0.y, 0.0f
z: MOV R0.z, 0.0f
w: MOV R0.w, 0.0f
06 EXP_DONE: PIX0, R0
END_OF_PROGRAM
; -------- End of Disassembly --------------------
But in this version, scratch buffer is used instead of indexed registers. And if I'm not misunderstanding, scratch buffer is located in RAM and is much slower than registers.
I am wondering if this is a bug that x0 and x1 have different meanings. Or at least the disassembly result of first version is not what the IL kernel supposed to be.