cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

valerioa
Adept II

error: invalid instruction mnemonic 'vfmaddsd' while running FFT sample on Linux x86_64

I'm trying the FFT sample on a Ubuntu 11.04 with a FX-8150 and 4 Firepro 3d V5800

My hardware:

Motherboard: MSI FX990-GD80

CPU: AMD FX 8150 3.6 GHz

RAM: 32Gb

4x FirePro 3d V5800

OS Ubuntu 11.04

Drivers and SDK:

FirePro_8.911.3.1_Linux_X32X64_132092.zip

AMD-APP-SDK-v2.6-lnx64.tgz

The FFT sample runs fine on the GPUs:

valerio@hotshot:~$ /opt/AMDAPP/samples/opencl/bin/x86_64/FFT --device gpu

Platform 0 : Advanced Micro Devices, Inc.

Original Input Real

15.3732 201.81 51.9855 89.2322 92.572 34.4675 96.2478 66.3863 11.345 225.168

Original Input Img

0.0600514 0.788318 0.203068 0.348563 0.361609 0.134639 0.375968 0.259322 0.0443163 0.879562

Platform found : Advanced Micro Devices, Inc.

Selected Platform Vendor : Advanced Micro Devices, Inc.

Device 0 : Juniper Device ID is 0x20fd5e0

Device 1 : Juniper Device ID is 0x28a5e30

Device 2 : Juniper Device ID is 0x2aced60

Device 3 : Juniper Device ID is 0x2a0a670

Executing kernel for 1 iterations

-------------------------------------------

Output real

131643 -1085.95 -997.15 -1791.52 532.118 1659.74 -166.271 969.692 1189.76 -862.707

Output img

514.23 2289.84 936.489 -603.839 699.7 1018.18 1900.06 795.439 -1328.03 -293.334

But it crashes on the cpu:

valerio@hotshot:~$ /opt/AMDAPP/samples/opencl/bin/x86_64/FFT --device cpu

Platform 0 : Advanced Micro Devices, Inc.

Original Input Real

15.3732 201.81 51.9855 89.2322 92.572 34.4675 96.2478 66.3863 11.345 225.168

Original Input Img

0.0600514 0.788318 0.203068 0.348563 0.361609 0.134639 0.375968 0.259322 0.0443163 0.879562

Platform found : Advanced Micro Devices, Inc.

Selected Platform Vendor : Advanced Micro Devices, Inc.

Device 0 : AMD FX(tm)-8150 Eight-Core Processor            Device ID is 0x1c0c650

<inline asm>:1:2: error: invalid instruction mnemonic 'vfmaddsd'

    vfmaddsd %xmm2, %xmm0, %xmm1, %xmm0

    ^

LLVM ERROR: Error parsing inline asm

I seems that the llvm compiler, when run on a bulldozer cpu generates code for a valid instruction that is unknown to the assembler.

Is this a known problem? Am I doing something wrong?

Thanks,

Valerio


0 Likes
1 Solution

Yep, that was it. I restored the library  libamdocl64.so  from AMD-APP-SDK-v2.6-RC3-lnx64 to

/opt/AMDAPP/lib/x86_64/

and the problem went away.

There must be an old version of libamdocl64.so that comes with the Ubuntu open source package fglrx that uses llvm/clang to compile the code. That library was getting in the way.

dneill, thank you very much for your help and ideas,

Valerio

View solution in original post

0 Likes
15 Replies
qneill
Staff

Hi @valerioa,
Can you determine what assembler is being used?  The vfmaddsd instruction is an FMA4 instruction, and the assembler you are using must support that.  FMA4 support has been in binutils since 2.19.51.0.12 in July of 2009: http://gcc.gnu.org/ml/gcc/2009-07/msg00323.html
You could try something like:
find /opt/AMDAPP/samples/opencl/bin/x86_64 -name as -print
I tested the default assembler on Ubuntu 10.10 and it supports FMA4:
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=10.10
DISTRIB_CODENAME=maverick
DISTRIB_DESCRIPTION="Ubuntu 10.10"

$ uname -a
Linux gcc0 2.6.35-22-generic #33-Ubuntu SMP Sun Sep 19 20:32:27 UTC 2010 x86_64 GNU/Linux

$ echo '.text
_start:
    vfmaddsd %xmm0, %xmm1, %xmm2, %xmm3

' > tst.s

$ /usr/bin/as --version
GNU assembler (GNU Binutils for Ubuntu) 2.20.51-system.20100908
Copyright 2010 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or later.
This program has absolutely no warranty.
This assembler was configured for a target of `x86_64-linux-gnu'.

$ /usr/bin/as -o tst.o tst.s

$ /usr/bin/objdump -d tst.o

tst.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <_start>:
   0:   c4 e3 e9 6b d8 10       vfmaddsd %xmm0,%xmm1,%xmm2,%xmm3

0 Likes

Hi @valerioa,

I see that the AMD-APP-SDK-v2.6-lnx64.tar doesn't come with an assembler, so you'll have to determine what assembler is being called.

Perhaps your PATH is picking up an older assembler or you are missing a development package?

$ which as

/usr/bin/as

$ /usr/bin/as --version

GNU assembler (GNU Binutils for Ubuntu) 2.20.51-system.20100908

Copyright 2010 Free Software Foundation, Inc.

This program is free software; you may redistribute it under the terms of

the GNU General Public License version 3 or later.

This program has absolutely no warranty.

This assembler was configured for a target of `x86_64-linux-gnu'.

$ dpkg-query --search /usr/bin/as

binutils: /usr/bin/as

$

$ dpkg-query --list binutils

Desired=Unknown/Install/Remove/Purge/Hold

| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend

|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)

||/ Name                        Version                     Description

+++-===========================-===========================-======================================================================

ii  binutils                    2.20.51.20100908-0ubuntu2   The GNU assembler, linker and binary utilities

0 Likes

Hi qneill,

thank you for your replies. I believe that with the AMD-APP-SDK-v2.6-lnx64.tar version of the SDK, the assembler is embedded in some opencl library. You can check it with:

strace -f -e trace=execve  -v /opt/AMDAPP/samples/opencl/bin/x86_64/FFT --device cpu

execve("/opt/AMDAPP/samples/opencl/bin/x86_64/FFT", ["/opt/AMDAPP/samples/opencl/bin/x"..., "--device", "cpu"], ["TERM=xterm", "SHELL=/bin/bash", "XDG_SESSION_COOKIE=aba4fc779fbe0"..., "SSH_CLIENT=10.22.0.100 62778 22", "SSH_TTY=/dev/pts/0", "USER=valerio", "LS_COLORS=rs=0:di=01;34:ln=01;36"..., "LD_LIBRARY_PATH=:/opt/AMDAPP/lib"..., "MAIL=/var/mail/valerio", "PATH=/usr/local/sbin:/usr/local/"..., "AMDAPPSDKROOT=/opt/AMDAPP", "PWD=/home/valerio/log", "LANG=en_US.UTF-8", "SHLVL=1", "HOME=/home/valerio", "LOGNAME=valerio", "SSH_CONNECTION=10.22.0.100 62778"..., "LESSOPEN=| /usr/bin/lesspipe %s", "LESSCLOSE=/usr/bin/lesspipe %s %"..., "_=/usr/bin/strace", "OLDPWD=/home/valerio"]) = 0

Platform 0 : Advanced Micro Devices, Inc.

Original Input Real

15.3732 201.81 51.9855 89.2322 92.572 34.4675 96.2478 66.3863 11.345 225.168

Original Input Img

0.0600514 0.788318 0.203068 0.348563 0.361609 0.134639 0.375968 0.259322 0.0443163 0.879562

Platform found : Advanced Micro Devices, Inc.

Selected Platform Vendor : Advanced Micro Devices, Inc.

Device 0 : AMD FX(tm)-8150 Eight-Core Processor            Device ID is 0x3878f00

Process 5015 attached

Process 5016 attached

Process 5017 attached

Process 5018 attached

Process 5019 attached (waiting for parent)

Process 5019 resumed (parent 5015 ready)

Process 5020 attached

Process 5021 attached

Process 5022 attached (waiting for parent)

Process 5022 resumed (parent 5015 ready)

Process 5023 attached (waiting for parent)

Process 5023 resumed (parent 5015 ready)

<inline asm>:1:2: error: invalid instruction mnemonic 'vfmaddsd'

    vfmaddsd %xmm2, %xmm0, %xmm1, %xmm0

    ^

LLVM ERROR: Error parsing inline asm

)                                       = ? <unavailable>

No process is invoked with execve, so I suppose the assembler is embedded with the library.  I was hoping the AMD APP SDK developers team would chime in.

You can also check it with:

strace -o log -ff /opt/AMDAPP/samples/opencl/bin/x86_64/FFT --device cpu

the GNU as on ubuntu 11.04 supports that instructions, as you showed on 10.10

0 Likes

Hi

Hmm, when I do that, I see execve searching around in my PATH for 'as' like this...

$ strace -f -e trace=execve  -v /opt/AMDAPP/samples/opencl/bin/x86_64/FFT --device cpu 2>&1 | cut -c1-80
execve("/opt/AMDAPP/samples/opencl/bin/x86_64/FFT", ["/opt/AMDAPP/samples/opencl
Platform 0 : Advanced Micro Devices, Inc.

Original Input Real
15.3732 201.81 51.9855 89.2322 92.572 34.4675 96.2478 66.3863 11.345 225.168


Original Input Img
0.0600514 0.788318 0.203068 0.348563 0.361609 0.134639 0.375968 0.259322 0.04431

Platform found : Advanced Micro Devices, Inc.

Selected Platform Vendor : Advanced Micro Devices, Inc.
Device 0 : AMD Engineering Sample Device ID is 0x1056250
Process 22988 attached (waiting for parent)
Process 22988 resumed (parent 22987 ready)
Process 22989 attached
Process 22990 attached (waiting for parent)
Process 22990 resumed (parent 22988 ready)
Process 22991 attached (waiting for parent)
Process 22991 resumed (parent 22988 ready)
Process 22992 attached (waiting for parent)
Process 22992 resumed (parent 22988 ready)
Process 22993 attached (waiting for parent)
Process 22993 resumed (parent 22987 ready)
Process 22987 suspended
[pid 22993] execve("/home/qneill/bin/as", ["as", "--64", "/tmp/OCL0C9tsr.s", "-o
[pid 22993] execve("/usr/local/sbin/as", ["as", "--64", "/tmp/OCL0C9tsr.s", "-o"
[pid 22993] execve("/usr/bin/as", ["as", "--64", "/tmp/OCL0C9tsr.s", "-o", "/tmp
Process 22987 resumed
Process 22993 detached
[pid 22987] --- SIGCHLD (Child exited) @ 0 (0) ---
Process 22994 attached (waiting for parent)
Process 22994 resumed (parent 22987 ready)
Process 22987 suspended
[pid 22994] execve("/home/qneill/bin/ld", ["ld", "-m", "elf_x86_64", "-shared",
[pid 22994] execve("/usr/local/sbin/ld", ["ld", "-m", "elf_x86_64", "-shared", "
[pid 22994] execve("/usr/bin/ld", ["ld", "-m", "elf_x86_64", "-shared", "/tmp/OC
Process 22987 resumed
Process 22994 detached
[pid 22987] --- SIGCHLD (Child exited) @ 0 (0) ---
Executing kernel for 1 iterations
-------------------------------------------
Process 22989 detached
Process 22990 detached
Process 22991 detached
Process 22992 detached
Process 22988 detached

Output real
131643 -1085.95 -997.15 -1791.52 532.119 1659.74 -166.27 969.692 1189.76 -862.70


Output img
514.23 2289.84 936.49 -603.839 699.7 1018.18 1900.06 795.439 -1328.03 -293.334

0 Likes

qneill,

which CPU do you run this on?  is "AMD Engineering Sample Device ID is 0x1056250" an Athlon?

When I run it on a an Intel CPU or a k8-family AMD, I can see the execve's happening as well. But when I run it on a FX 8150, I don't see them anymore. I suspect that somewhere in clBuildProgram() there must be something like:

if (cpu =="Intel") {

     execve(...)

} else if ( cpu == "AMD-k8") {

     execve(...)

} else if ( cpu == "AMD-k10") {

     execve(...)

} else if ( cpu == "AMD-15h" ) {

     run_embedded_llvm_compiler_and_as(...)

}

0 Likes

I ran it on FX-8120 and saw the same behavior (execve of /usr/bin/as after trying other things in PATH) as well:

# head /proc/cpuinfo

processor       : 0

vendor_id       : AuthenticAMD

cpu family      : 21

model           : 1

model name      : AMD FX(tm)-8120 Eight-Core Processor

I reached out to the OpenCL team inside AMD, and @MicahVillmow on the forums, We should hear from them soon.

Message was edited by: Quentin Neill - added @MicahVillmow to the thread.

0 Likes

Using the advanced editor didn't seem to bring in automatically...  doing it again.

0 Likes

valerioa,

I got a response from an engineer internally. The CPU runtime uses the system assembler and the one on your system is not new enough to support FMA4 instructions. So, there are two options.

1) Update the system assembler to a newer version

2) pass in the compiler option "-disable-avx" to clBuildProgram to disable AVX code generation.

Micah,

thank you for your help. Would you kindly put me in that with that engineer?  Reasons I'm asking is that the as I have actually does build AVX instructions

valerio@hotshot:~$ /usr/bin/as --version

GNU assembler (GNU Binutils for Ubuntu) 2.21.0.20110327

Copyright 2011 Free Software Foundation, Inc.

This program is free software; you may redistribute it under the terms of

the GNU General Public License version 3 or later.

This program has absolutely no warranty.

This assembler was configured for a target of `x86_64-linux-gnu'.

valerio@hotshot:~$ cat test.s

_start:

    vfmaddsd %xmm0, %xmm1, %xmm2, %xmm3

valerio@hotshot:~$ /usr/bin/as -o test.o test.s

valerio@hotshot:~$ objdump -D test.o

test.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <_start>:

   0:    c4 e3 e9 6b d8 10        vfmaddsd %xmm0,%xmm1,%xmm2,%xmm3

and when I run one of the samples under ltrace or strace, I see not assembler being execve'd during the process. It almost appears as if there is an llvm-as embedded in the library?

0 Likes

Hi @valerioa,

Can you try "strace -f"  without the -e and see if an assembler is being called via some system call other than execve9()?  Perhaps FFT is using another variant like plain exec().

Alternately, can you copy /usr/bin/as to the current directory and add PATH=.:$PATH and see if the problem goes away?

Just some thoughts.

0 Likes

Hello qneill,

and thank you very much for your willingness to help. I appreciate it very much. some extra infos:

1) I've installed the most recent binutils package from the GNU ftp site (binutils-2.22). the newest as is in the path

valerio@hotshot:~$ which as

/usr/local/pkg/binutils-2.22/bin/as

but the net result is the same:

valerio@hotshot:~$ /opt/AMDAPP/samples/opencl/bin/x86_64/FFT --device cpu

Platform 0 : Advanced Micro Devices, Inc.

Original Input Real

15.3732 201.81 51.9855 89.2322 92.572 34.4675 96.2478 66.3863 11.345 225.168

Original Input Img

0.0600514 0.788318 0.203068 0.348563 0.361609 0.134639 0.375968 0.259322 0.0443163 0.879562

Platform found : Advanced Micro Devices, Inc.

Selected Platform Vendor : Advanced Micro Devices, Inc.

Device 0 : AMD FX(tm)-8150 Eight-Core Processor            Device ID is 0x2069580

<inline asm>:1:2: error: invalid instruction mnemonic 'vfmaddsd'

    vfmaddsd %xmm2, %xmm0, %xmm1, %xmm0

    ^

LLVM ERROR: Error parsing inline asm

2) the error string "invalid instruction mnemonic '%s'" does not come from any as. if you put an invalid mnemonic in your test.s file as

valerio@hotshot:~$ cat test.s

_start:

    vfmaddsdX %xmm0, %xmm1, %xmm2, %xmm3

the error coming from as would look very different

test.s: Assembler messages:

test.s:2: Error: no such instruction: `vfmaddsdx %xmm0,%xmm1,%xmm2,%xmm3'

3) the string  'invalid instruction mnemonic' can only be found in /usr/lib/fglrx/libamdocl64.so (or its 32bits counterpart), so I suspect that whatever attempts to assemble that code is in libamdocl64.so.

4) I did a full strace. I see no other forks or exec to run as.

Maybe I have the wrong libamdocl64.so ?

Mine is:

valerio@hotshot:~$ ls -l /usr/lib/fglrx/libamdocl64.so

-rw-r--r-- 1 root root 22287280 2012-03-06 14:54 /usr/lib/fglrx/libamdocl64.so

valerio@hotshot:~$ md5sum /usr/lib/fglrx/libamdocl64.so

8ab626fcdd8fa44bdeec995f4034ecd7  /usr/lib/fglrx/libamdocl64.so

qneill would you mind checking which library is loaded on your FX 8120? doing a

strace -o log opt/AMDAPP/samples/opencl/bin/x86_64/FFT --device cpu

and finding in log which libamdocl64.so library is loaded?

There should be a line with

open("/xxx/yyy/zzzz/libamdocl64.so", O_RDONLY) = <a positive integer number>

That's the library that is actually loaded.

Thanks,


0 Likes

Yep, that was it. I restored the library  libamdocl64.so  from AMD-APP-SDK-v2.6-RC3-lnx64 to

/opt/AMDAPP/lib/x86_64/

and the problem went away.

There must be an old version of libamdocl64.so that comes with the Ubuntu open source package fglrx that uses llvm/clang to compile the code. That library was getting in the way.

dneill, thank you very much for your help and ideas,

Valerio

0 Likes

No problem.

Interesting interaction that the OpenCL team should be tracking probably anyway.

Cheers,

--

Quentin

0 Likes

qneill,

you are right. As a matter of fact I was wrong in thinking that the library /usr/lib/fglrx/libamdocl64.so came from the open source fglrx package. It actually come from the debian package generated from the fglrx-8.911.3.1 driver off of the AMD website.

In the past, there must have been a version of the libamdocl64.so from AMD that used the llvm compiler for bulldozer CPUs

Details:

root@hotshot:~/ati# md5sum /usr/lib/fglrx/libamdocl64.so

8ab626fcdd8fa44bdeec995f4034ecd7  /usr/lib/fglrx/libamdocl64.so

root@hotshot:~/ati# dpkg -S /usr/lib/fglrx/libamdocl64.so

fglrx: /usr/lib/fglrx/libamdocl64.so

root@hotshot:~/ati# dpkg -l |grep fglrx

ii  fglrx                                 2:8.911-0ubuntu1                           Video driver for the AMD graphics accelerators

ii  fglrx-amdcccle                        2:8.911-0ubuntu1                           Catalyst Control Center for the AMD graphics accelerators

ii  fglrx-dev                             2:8.911-0ubuntu1                           Video driver for the AMD graphics accelerators (devel files)

0 Likes

Nice work.  This will help others who might see similar issues, thanks for posting.

0 Likes