the729

Problems with indexed array and global buffer in IL

Discussion created by the729 on Mar 16, 2009
Latest reply on Mar 18, 2009 by MicahVillmow

Both scatter_IL and scratch_buffer_IL examples work fine. However, combining these two features together seems problematic.

The testing IL kernel is:

il_ps_2_0
dcl_indexed_temp_array x0[2]
dcl_input vObjIndex0
mov x0[vObjIndex0.x], 1
mov r0, x0[0]
mov g[0], r0
endmain

I am using SKA1.1 and CAL 9.1 to compile it into RV770 assembly. It reads:

; --------  Disassembly --------------------
00 ALU: ADDR(32) CNT(4) 
      0  x: MOV         R1.x,  R1.x      
         y: MOV         R1.y,  R1.y      
         z: MOV         R1.z,  R1.z      
         w: MOV         R1.w,  R1.w      
01 ALU: ADDR(36) CNT(5) 
      1  x: MOVA_INT    ____,  R0.x      
      2  x: MOV         R4[A0.x].x,  R1.x      
         y: MOV         R4[A0.x].y,  R1.y      
         z: MOV         R4[A0.x].z,  R1.z      
         w: MOV         R4[A0.x].w,  R1.w      
02 ALU: ADDR(41) CNT(13) 
      3  x: MOVA_INT    ____,  0.0f      
      4  x: MOV         R0.x,  R4[A0.x].x      
         y: MOV         R0.y,  R4[A0.x].y      
         z: MOV         R0.z,  R4[A0.x].z      
         w: MOV         R0.w,  R4[A0.x].w      
      5  x: MOV         R4.x,  R0.x      
         y: MOV         R4.y,  R0.y      
         z: MOV         R4.z,  R0.z      
         w: MOV         R4.w,  R0.w      
      6  x: MOV         R0.x,  0.0f      
         y: MOV         R0.y,  0.0f      
         z: MOV         R0.z,  0.0f      
         w: MOV         R0.w,  0.0f      
03 EXP_DONE: PIX0, R0
END_OF_PROGRAM

; -------- End of Disassembly --------------------


It seems x0[] and g[] become identical, and the kernel contains no MEM_EXPORT_WRITE operation, so that it will not write the global buffer.

However, changing all x0[] into x1[] (including declaration) in the IL kernel solves the problem. Now it reads:

; --------  Disassembly --------------------
00 ALU: ADDR(32) CNT(4) 
      0  x: MOV         R1.x,  R1.x      
         y: MOV         R1.y,  R1.y      
         z: MOV         R1.z,  R1.z      
         w: MOV         R1.w,  R1.w      
01 MEM_SCRATCH_WRITE_IND_ACK: VEC_PTR[0+R0.x], R1, ARRAY_SIZE(1) ELEM_SIZE(3) 
02 WAIT_ACK:  Outstanding_acks <= 0 
03 VTX: ADDR(48) CNT(1) 
      1  RD_SCRATCH R0, VEC_PTR[0], ARRAY_SIZE(1) ELEM_SIZE(3) UNCACHED BURST_CNT(0) 
04 MEM_EXPORT_WRITE: DWORD_PTR[0], R0, ELEM_SIZE(3) 
05 ALU: ADDR(36) CNT(4) 
      2  x: MOV         R0.x,  0.0f      
         y: MOV         R0.y,  0.0f      
         z: MOV         R0.z,  0.0f      
         w: MOV         R0.w,  0.0f      
06 EXP_DONE: PIX0, R0
END_OF_PROGRAM

; -------- End of Disassembly --------------------

 

But in this version, scratch buffer is used instead of indexed registers. And if I'm not misunderstanding, scratch buffer is located in RAM and is much slower than registers.

I am wondering if this is a bug that x0 and x1 have different meanings. Or at least the disassembly result of first version is not what the IL kernel supposed to be.

Outcomes