corry

Bursting...What am I doing wrong?

Discussion created by corry on Oct 24, 2011
Latest reply on Oct 25, 2011 by corry

See the code...As I said before, I'm writing something doing block processing.  Performance when I limited the amount of data to 12 bytes/thread was about what I expected.  I scaled up in size to 60 bytes, not even half my block size, and performance is abysmal!  So I go to the handy dandy disassembly, and to my shock and horror, despite reading consecutive addresses, I see no bursting...see attached code.  For reference, l8.x=16, since as far as I could tell, uav addresses are byte addresses, so +16 should make 1 full GPR....and indeed, when I print my data, I do see my sequential data in order as I expect...r25.x is where I'm storing the address, l120.x happens to correspond to 60, the buffer size I'm reading in.

Everything I can see says this should burst.  I'm reading sequential values into sequential registers, so what gives?  I believe this is with Catalyst 11.10 preview 2

 

iadd r32.x, r25.x, l8.x iadd r32.y, r32.x, l8.x iadd r32.z, r32.y, l8.x iadd r32.w, r32.z, l8.x iadd r33.x, r32.w, l8.x iadd r33.y, r33.x, l8.x iadd r33.z, r33.y, l8.x iadd r33.w, r33.z, l8.x iadd r34.x, r33.w, l8.x iadd r34.y, r34.x, l8.x iadd r34.z, r34.y, l8.x iadd r34.w, r34.z, l8.x iadd r35.x, r34.w, l8.x iadd r35.y, r35.x, l8.x uav_raw_load_id(8) r6, r25.x uav_raw_load_id(8) r7, r32.x uav_raw_load_id(8) r8, r32.y uav_raw_load_id(8) r9, r32.z uav_raw_load_id(8) r10, r32.w uav_raw_load_id(8) r11, r33.x uav_raw_load_id(8) r12, r33.y uav_raw_load_id(8) r13, r33.z uav_raw_load_id(8) r14, r33.w uav_raw_load_id(8) r15, r34.x uav_raw_load_id(8) r16, r34.y uav_raw_load_id(8) r17, r34.z uav_raw_load_id(8) r18, r34.w uav_raw_load_id(8) r19, r35.x uav_raw_load_id(8) r20, r35.y iadd r25.x, r25.x, l120.x /////END IL.... 132 TEX: ADDR(22084) CNT(15) 299 VFETCH R9, R0.y, fc170 FORMAT(32_32_32_32_FLOAT) FETCH_TYPE(NO_INDEX_OFFSET) 300 VFETCH R38, R0.x, fc170 FORMAT(32_32_32_32_FLOAT) FETCH_TYPE(NO_INDEX_OFFSET) 301 VFETCH R10, R0.z, fc170 FORMAT(32_32_32_32_FLOAT) FETCH_TYPE(NO_INDEX_OFFSET) 302 VFETCH R11, R0.w, fc170 FORMAT(32_32_32_32_FLOAT) FETCH_TYPE(NO_INDEX_OFFSET) 303 VFETCH R12, R1.y, fc170 FORMAT(32_32_32_32_FLOAT) FETCH_TYPE(NO_INDEX_OFFSET) 304 VFETCH R13, R1.x, fc170 FORMAT(32_32_32_32_FLOAT) FETCH_TYPE(NO_INDEX_OFFSET) 305 VFETCH R14, R1.z, fc170 FORMAT(32_32_32_32_FLOAT) FETCH_TYPE(NO_INDEX_OFFSET) 306 VFETCH R15, R1.w, fc170 FORMAT(32_32_32_32_FLOAT) FETCH_TYPE(NO_INDEX_OFFSET) 307 VFETCH R16, R2.y, fc170 FORMAT(32_32_32_32_FLOAT) FETCH_TYPE(NO_INDEX_OFFSET) 308 VFETCH R17, R2.x, fc170 FORMAT(32_32_32_32_FLOAT) FETCH_TYPE(NO_INDEX_OFFSET) 309 VFETCH R18, R2.z, fc170 FORMAT(32_32_32_32_FLOAT) FETCH_TYPE(NO_INDEX_OFFSET) 310 VFETCH R19, R2.w, fc170 FORMAT(32_32_32_32_FLOAT) FETCH_TYPE(NO_INDEX_OFFSET) 311 VFETCH R23, R3.y, fc170 FORMAT(32_32_32_32_FLOAT) FETCH_TYPE(NO_INDEX_OFFSET) 312 VFETCH R21, R3.x, fc170 FORMAT(32_32_32_32_FLOAT) FETCH_TYPE(NO_INDEX_OFFSET) 313 VFETCH R20, R3.z, fc170 FORMAT(32_32_32_32_FLOAT) FETCH_TYPE(NO_INDEX_OFFSET)

Outcomes