I'm getting some strange behavior with the new system I just built. I've narrowed it down to exactly the GPUs by swapping them out with an older ATI Radeon HD 5970.
So here's the walk through. Clean install of CentOS 6.2 x86_64 (also tried OpenSUSE 12.1 and Fedora 16, but figured the best support out of these three would be CentOS). I could install with the 7950s in the machine, but to boot I had to use the 5970. So using the 5970 I update all the packages, install kernel-devel, gcc, gcc-c++, and gcc-gfortran. Restart, then install the AMD Catalyst 12.2 linux driver. Reboot and the system works like a charm. Poweroff and swap back in the 7950s. Boot it up and even before the login screen the kernel panics (see pictures below).
So I booted up using the install DVD and went in the recovery mode, updated the xorg.conf file for each device to reflect their PCI bus numbers (which I got from "/sbin/lspci | grep VGA"). Rebooted and still no luck. I kept rebooting and after the sixth or so time things actually went well, but after another reboot things panicked again and again.
Please help with tips or suggestions. I only have one monitor attached to the primary (or master if you prefer) GPU. Could this be related to using those Crossfire connectors (I have two connectors attached between the two GPUs)? An improper xorg.conf (although I've using similar ones for other dual graphics configurations)? I tried the xorg.conf as shown below, both with things commented as shown and uncommented.
And should Crossfire be enabled or disabled if I want to develop and run OpenCL programs? When Crossfire is disabled I can only get one GPU to appear in clinfo.
Screen 0 "Screen0" 0 0
# Screen 1 "Screen1" 0 0
Option "DPMS" "true"
# Identifier "Monitor1"
# Option "DPMS" "true"
VendorName "Advanced Micro Devices, Inc."
Option "NoLogo" "True"
# Identifier "Device1"
# Driver "fglrx"
# VendorName "Advanced Micro Devices, Inc."
# BusID "PCI:4:0:0"
# Option "NoLogo" "True"
Viewport 0 0
# Identifier "Screen1"
# Monitor "Monitor1"
# Device "Device1"
# DefaultDepth 24
# SubSection "Display"
# Viewport 0 0
# Depth 24
I set DISPLAY=:0, but not COMPUTE. I checked with the 5970 and it has crossfire enabled (with the bridge between the dual GPUs onboard somewhere internal to the card). First thing Monday in lab (it's already Saturday here) I'll try disabling it and removing the bridges on the 7950s, but since it works on the 5970 without issue I'm not getting my hopes up.
I tried aticonfig --initial --adapter=all and that had the same problem with booting and OpenCL. Nothing so far has fixed the booting issue, but turning Crossfire on strangely seems to have solved the OpenCL issue. This runs counter to the installation notes that say Crossfire should be turned off in most cases.
This leads me to a new question. If OpenCL and OpenGL can interact and share buffer objects, is it possible to use Crossfire to accelerate rendering and then explicitly call OpenCL kernels on each device in the Crossfire? Most information I've read leads me to believe that Crossfire and APP are completely disjoint technologies, is that correct or am I missing something? Having to disable (and later re-enable) Crossfire every time an OpenCL application is running doesn't seem like a good solution for developers and clients.
Thanks for your feedback!
disable crossfire and remove any any bridge between cards. do you set COMPUTE=:0?
I disabled crossfire and removed all bridges between the cards, but I received the same error before login. I also removed one of the graphics cards and tested it, but still received the same error. This does not appear to be crossfire or 2x graphics cards related. This narrows down the issue to be something in the driver. What information (and what command or directory do I use to find it) should I provide so AMD can try to recreate this error on their side?
I have forwarded this issue on to the relevant team. As for the crossfire with OpenGL/CL interaction. This won't work. You will need to disable crossfire for the OpenGL part because the current software stack is not capable of determining where the surfaces are located in crossfire mode when using OpenCL. This is a known issue, and I'll let our linux team know that customers are requesting this support.