Hello,
I have a question about the behavior of the T_BUFFER_LOAD_FORMAT_X instruction on Southern Islands. The SI ISA Reference Guide states that the OFFSET field of MTBUF instructions is used only when OFFEN=0. However, I am seeing some code compiled with APP SDK 2.8 that produces the following instruction:
t_buffer_load_format_x v2, v2, s[16:19], 0 offen offset:4 format:[BUF_DATA_FORMAT_32, BUF_NUM_FMT_FLOAT] // 00000080: EBA01004 80040202
In this case, OFFEN=1, but OFFSET=4. Matching the assembly to the source code, it seems that the offset of 4 is in fact intended to be added, even though OFFEN=1. I tried to test this out a little: hex-editing the OFFSET field to 0 should have no effect according to the spec (since OFFEN=1), but it does in fact cause incorrect execution on my Radeon 7970. To further verify this, I ran the code in Multi2Sim, and the kernel does not execute correctly there either.
Is this in fact a documentation bug? The source code, compiler, and hardware all seem to agree that OFFSET should still be added even when OFFEN=1, but I'd like to be sure.
Thanks,
Dan
---
The full code is the following (from ViennaCL):
const char * const compressed_matrix_align1_vec_mul =
"__kernel void vec_mul(\n"
" __global const unsigned int * row_indices,\n"
" __global const unsigned int * column_indices, \n"
" __global const float * elements,\n"
" __global const float * vector, \n"
" __global float * result,\n"
" unsigned int size) \n"
"{ \n"
" for (unsigned int row = get_global_id(0); row < size; row += get_global_size(0))\n"
" {\n"
" float dot_prod = 0.0f;\n"
" unsigned int row_end = row_indices[row+1];\n"
" for (unsigned int i = row_indices[row]; i < row_end; ++i)\n"
" dot_prod += elements * vector[column_indices];\n"
" result[row] = dot_prod;\n"
" }\n"
"}\n"
; //compressed_matrix_align1_vec_mul
...
// at the beginning of this snippet, v2 = &row_indices[row];
t_buffer_load_format_x v3, v2, s[16:19], 0 offen format:[BUF_DATA_FORMAT_32, BUF_NUM_FMT_FLOAT] // 00000078: EBA01000 80040302
t_buffer_load_format_x v2, v2, s[16:19], 0 offen offset:4 format:[BUF_DATA_FORMAT_32, BUF_NUM_FMT_FLOAT] // 00000080: EBA01004 80040202
s_waitcnt vmcnt(0) // 00000088: BF8C1F70
v_cmp_gt_u32 s[36:37], v2, v3 // 0000008C: D1880024 00020702
s_and_saveexec_b64 s[36:37], s[36:37] // 00000094: BEA42424
s_cbranch_execz label_003F // 00000098: BF880018
...
(if OFFSET was not meant to be added, then the two load instructions would load from the same address, which wouldn't make sense here.)