2 Replies Latest reply on Jul 20, 2013 6:44 AM by olivea19

    SI TBUFFER_LOAD_FORMAT_X offen/offset fields

    dlustig

      Hello,

       

      I have a question about the behavior of the T_BUFFER_LOAD_FORMAT_X instruction on Southern Islands.  The SI ISA Reference Guide states that the OFFSET field of MTBUF instructions is used only when OFFEN=0.  However, I am seeing some code compiled with APP SDK 2.8 that produces the following instruction:

      t_buffer_load_format_x  v2, v2, s[16:19], 0 offen offset:4 format:[BUF_DATA_FORMAT_32, BUF_NUM_FMT_FLOAT] // 00000080: EBA01004 80040202

      In this case, OFFEN=1, but OFFSET=4.  Matching the assembly to the source code, it seems that the offset of 4 is in fact intended to be added, even though OFFEN=1.  I tried to test this out a little: hex-editing the OFFSET field to 0 should have no effect according to the spec (since OFFEN=1), but it does in fact cause incorrect execution on my Radeon 7970.  To further verify this, I ran the code in Multi2Sim, and the kernel does not execute correctly there either.

       

      Is this in fact a documentation bug?  The source code, compiler, and hardware all seem to agree that OFFSET should still be added even when OFFEN=1, but I'd like to be sure.

       

      Thanks,

      Dan

       

      ---

       

      The full code is the following (from ViennaCL):

       

      const char * const compressed_matrix_align1_vec_mul =

      "__kernel void vec_mul(\n"

      "          __global const unsigned int * row_indices,\n"

      "          __global const unsigned int * column_indices, \n"

      "          __global const float * elements,\n"

      "          __global const float * vector,  \n"

      "          __global float * result,\n"

      "          unsigned int size) \n"

      "{ \n"

      "  for (unsigned int row = get_global_id(0); row < size; row += get_global_size(0))\n"

      "  {\n"

      "    float dot_prod = 0.0f;\n"

      "    unsigned int row_end = row_indices[row+1];\n"

      "    for (unsigned int i = row_indices[row]; i < row_end; ++i)\n"

      "      dot_prod += elements[i] * vector[column_indices[i]];\n"

      "    result[row] = dot_prod;\n"

      "  }\n"

      "}\n"

      ; //compressed_matrix_align1_vec_mul

       

      ...

      // at the beginning of this snippet, v2 = &row_indices[row];

      t_buffer_load_format_x  v3, v2, s[16:19], 0 offen format:[BUF_DATA_FORMAT_32, BUF_NUM_FMT_FLOAT] // 00000078: EBA01000 80040302

      t_buffer_load_format_x  v2, v2, s[16:19], 0 offen offset:4 format:[BUF_DATA_FORMAT_32, BUF_NUM_FMT_FLOAT] // 00000080: EBA01004 80040202

      s_waitcnt     vmcnt(0)                                     // 00000088: BF8C1F70

      v_cmp_gt_u32  s[36:37], v2, v3                             // 0000008C: D1880024 00020702

      s_and_saveexec_b64  s[36:37], s[36:37]                     // 00000094: BEA42424

      s_cbranch_execz  label_003F                                // 00000098: BF880018

      ...

      (if OFFSET was not meant to be added, then the two load instructions would load from the same address, which wouldn't make sense here.)