2 Replies Latest reply on Jan 13, 2009 9:20 PM by shamsundar@uh.edu

    ACML bug (or undocumented feature?) : 32 bit versions for GFortran, PGI Fortran

    shamsundar@uh.edu
      Many routines in the 32-bit ACML libraries contain the REP prefix for X87 instructions

      While debugging some code, I found that the ACML 32-bit libraries for GFortran (4.2.0, downloaded today) and PGI Fortran (3.6.0) for Linux contained code sequences such as

      f3 dd 06 repz fldl (%esi)
      f3 dd 07 repz fldl (%edi)
      f3 d8 c9 repz fmul %st(1),%st

      in routines such as "dmmkern30x87_".

      Valgrind flags these instructions as invalid and raises a SIGILL exception.

      The AMD CPU instruction set descriptions state that "rep", "repz" and "repnz" prefixes apply only to string operations.

      Is there a gap in my understanding of this issue?

      I am running openSUSE 11.0-x64, on an HP PC with an Athlon-X2 CPU, 4G RAM.

      Thanks.

      N. Shamsundar
      University of Houston
        • ACML bug (or undocumented feature?) : 32 bit versions for GFortran, PGI Fortran

          Hello, Mr. Shamsundar,

          This looks like a bug in Valgrind's diassembler.

          While those instruction code bytes are indeed very odd looking, they are valid combinations.

          You are correct that the REP prefixes only cause repeats on the string instructions.  When they are applied to an instruction like fmul, they have no effect.  That does not make the code byte sequence invalid, however.  Any x86 processor from AMD, Intel or another manufacturer will simply go ahead and execute the fmul instruction, ignoring the nonsense prefix.

          As an aside, the codes F3 and F2 have completely different uses when they are part of an SSE instruction.  When I first saw your post, I thought that might be the case here, but those code bytes are always  ... F3 0F ... or ... F2 0F ....

          In this case, it looks like the REP prefixes were inserted to pad the instructions and make them longer.   This was probably someone tuning the code to get the best possible performance on a particular processor.  In some cases, simply changing the alignment of the instructions in memory might have made the difference between the instruction decoder dispatching three instructions in a cycle instead of just two.  It's quite possible that this is an old optimization for an obsolete processor, but it hasn't been removed because it has no performance effect on other processors.

          Some compilers will insert nonsense REP prefixes in order to align code, like aligning the top of a loop to a cache boundary.  You can align instruction codes by inserting NOPs, too, but even NOPs have to be treated as instructions by the cpu's instruction decoder.

          The language in the AMD Architecture manuals:  "should only be used with such string instructions" is probably misleading.

          Jim Conyngham

          • ACML bug (or undocumented feature?) : 32 bit versions for GFortran, PGI Fortran
            shamsundar@uh.edu
            Thanks. Your response clear it up for me.

            N. Shamsundar