AnsweredAssumed Answered

SI TBUFFER_LOAD_FORMAT_X offen/offset fields

Question asked by dlustig on Dec 6, 2012
Latest reply on Jul 20, 2013 by olivea19

Hello,

 

I have a question about the behavior of the T_BUFFER_LOAD_FORMAT_X instruction on Southern Islands.  The SI ISA Reference Guide states that the OFFSET field of MTBUF instructions is used only when OFFEN=0.  However, I am seeing some code compiled with APP SDK 2.8 that produces the following instruction:

t_buffer_load_format_x  v2, v2, s[16:19], 0 offen offset:4 format:[BUF_DATA_FORMAT_32, BUF_NUM_FMT_FLOAT] // 00000080: EBA01004 80040202

In this case, OFFEN=1, but OFFSET=4.  Matching the assembly to the source code, it seems that the offset of 4 is in fact intended to be added, even though OFFEN=1.  I tried to test this out a little: hex-editing the OFFSET field to 0 should have no effect according to the spec (since OFFEN=1), but it does in fact cause incorrect execution on my Radeon 7970.  To further verify this, I ran the code in Multi2Sim, and the kernel does not execute correctly there either.

 

Is this in fact a documentation bug?  The source code, compiler, and hardware all seem to agree that OFFSET should still be added even when OFFEN=1, but I'd like to be sure.

 

Thanks,

Dan

 

---

 

The full code is the following (from ViennaCL):

 

const char * const compressed_matrix_align1_vec_mul =

"__kernel void vec_mul(\n"

"          __global const unsigned int * row_indices,\n"

"          __global const unsigned int * column_indices, \n"

"          __global const float * elements,\n"

"          __global const float * vector,  \n"

"          __global float * result,\n"

"          unsigned int size) \n"

"{ \n"

"  for (unsigned int row = get_global_id(0); row < size; row += get_global_size(0))\n"

"  {\n"

"    float dot_prod = 0.0f;\n"

"    unsigned int row_end = row_indices[row+1];\n"

"    for (unsigned int i = row_indices[row]; i < row_end; ++i)\n"

"      dot_prod += elements[i] * vector[column_indices[i]];\n"

"    result[row] = dot_prod;\n"

"  }\n"

"}\n"

; //compressed_matrix_align1_vec_mul

 

...

// at the beginning of this snippet, v2 = &row_indices[row];

t_buffer_load_format_x  v3, v2, s[16:19], 0 offen format:[BUF_DATA_FORMAT_32, BUF_NUM_FMT_FLOAT] // 00000078: EBA01000 80040302

t_buffer_load_format_x  v2, v2, s[16:19], 0 offen offset:4 format:[BUF_DATA_FORMAT_32, BUF_NUM_FMT_FLOAT] // 00000080: EBA01004 80040202

s_waitcnt     vmcnt(0)                                     // 00000088: BF8C1F70

v_cmp_gt_u32  s[36:37], v2, v3                             // 0000008C: D1880024 00020702

s_and_saveexec_b64  s[36:37], s[36:37]                     // 00000094: BEA42424

s_cbranch_execz  label_003F                                // 00000098: BF880018

...

(if OFFSET was not meant to be added, then the two load instructions would load from the same address, which wouldn't make sense here.)

Outcomes