cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

cgorac
Journeyman III

clinfo listing Radeon HD 7990 only if run as root

The operating system is CentOS 6.  There are 2 NVIDIA GeForce cards installed on the machine, as well as a Radeon HD 7990.  The machine is run in headless mode, i.e. X server is not running.

I've installed corresponding drivers, by creating RPM package from the driver installation script, and then by installing RPM.  The driver version, as mentioned in the generated RPM file name, seems to be 12.104; I tried later also with latest beta of the driver, version 13.20, but results were the same.  After installing driver, I've installed AMD APP SDK, version 2.8.1.0.

CUDA SDK 5.5 is installed on this machine too. CUDA version of libOpenCL.so get used, as verified for example by "ldd /usr/bin/clinfo".  I tried also with using LD_LIBRARY_PATH to have AMD version of libOpenCL.so used, and end results are practically the same as described below.

I'm accessing this machine through SSH (I tried both "ssh machinename" and "ssh -Y machinename").  The problem is that clinfo is not listing Radeon GPUs (this card is dual-GPU) when run as ordinary user.  When clinfo run through "sudo clinfo", it would then properly list Radeon GPUs.  More precisely, clinfo would actually crash when run this way, with error message "clinfo: relocation error: clinfo: symbol clRetainDevice, version OPENCL_1.2 not defined in file libOpenCL.so.1 with link time reference" (when AMD version of libOpenCL.so used, it would just crash without any error message), but at least it would properly show that there are 3 devices (CPU, and 2 GPUs) for AMD platform, and on the other side I have little program of my own that is listing devices, and that would properly list all AMD and NVIDIA GPUs, as well as CPU device, present on this machine when run under sudo.  However, when run under an ordinary user, only NVIDIA GPU devices, and CPU device, get listed.

I read several threads on this forum about alike problems, and tried some of solutions proposed there. I tried to add:

xhost +

chmod uog+rw /dev/ati/card*

into /etc/gdm/Init/Default, and then to restart gdm ("sudo killall gdm-binary").  This had no effect, in the sense that permissions on /dev/ati/card* files were not changed.  I tried to change permissions manually, but clinfo still would not list Radeon GPUs when not run under sudo.

Any further suggestions here?  Is it absolutely necessary to have X server running on given machine in order for ordinary users to be able to see Radeon GPUs when running some OpenCL codes? How to configure X in this case?  Why clinfo is crashing when listing devices, with the error message above?

0 Likes
15 Replies
nou
Exemplar

Yes it is necessary to run Xserver to be able access AMD GPU with OpenCL.

Crashing of clinfo is from different version of libOpenCL.so. nVidia provide only 1.1 but clinfo need 1.2 version. Crash with AMD libOpenCL.so is most likely cause by using 1.2 version call on nVidia device. This can be from using C++ OpenCL binding.

0 Likes
himanshu_gautam
Grandmaster

clinfo crashes because NVIDIA devicse are OpenCL 1.1 and AMD is 1.2

This is a known issue and must have been fixed by now.

There was some issue in cl.hpp which was fixed in the Khronos site.

Looks like clinfo was not recompiled with latest cl.hpp.

Not sure why this is not fixed yet. I will go check.

I doubt 13.20. It must actually be 13.10 Beta. Please confirm.

It really does not matter to which OpenCL runtime you are linking your application against.

You can still list ALL platforms installed.

This is made available through a mechanism called ICD - Installable Client Driver.

Please read OpenCL spec for more details.

Regarding X server requirement, AMD was working to fix this long time back.

I thought this must be fixed by now.

I have asked around.

Will get you more details in a day or two.

Thanks for you patience,

Best,

Bruhaspati

0 Likes

catalyst 13.10 beta IIRC generate for me deb package with 13.20 as version number.

Oh.. Thanks for that information, Good friend Nou...

Cgorac,

If you have used only 13.10 Beta (as pointed by Nou) then its fine.

I have asked around for this information. If there is any update, I will keep you posted.

-

Bruhaspati

0 Likes
cgorac
Journeyman III

Yes, same here: I said "13.20" as my RPM package generated by driver installer (downloaded from here: http://support.amd.com/en-us/kb-articles/Pages/latest-linux-beta-driver.aspx - listed as 13.11 Beta V1 driver at the moment) mentions 13.20 in the filename.

The clinfo is listing both AMD and NVIDIA platforms, no matters if I'm using NVIDIA or AMD version of libOpenCL.so; thus ICD files are properly set up.  However the problem is, as explained in my first message, that only 1 device (CPU) is reported for AMD platform when clinfo run as an ordinary user, and 3 devices (CPU, and 2 GPUs) are reported when clinfo run as root.

I have to correct my statement about clinfo crashing: it is crashing only when NVIDIA version of libOpenCL.so used, when LD_LIBRARY_PATH set so that AMD version of libOpenCL.so used, then it won't crash.  I'm not sure why yesterday it crashed for me even with AMD version of libOpenCL.so, but I've re-installed everything this morning, driver first and then AMD SDK, and now it works fine with AMD version of libOpenCL.so, but results are still as described above (AMD GPUs not reported when clinfo run as an ordinary user).

I have to ask again, as received two opposite answers: is X server running necessary to enable AMD GPUs for OpenCL, for an ordinary user logged in remotely?  If so, how to setup xorg.conf - as we have multiple GPUs on this machine, is it important that X server runs on AMD GPUs, or it doesn't matter?

0 Likes

OpenCL with no X-server on Linux

Check this announcement from one of the AMD employees.

Root was needed in April 2013.

I have asked around to see if this is the case even now..

And, is there any improvements for non-root users on the way..

So, at the moment go with your X-server configuration.

Refer to CentOS or Redhat forums to first setup X-server..

And, then do the xhost + thing and then you will be ready to go...

Best

Bruhaspati

0 Likes

I've configured X ("aticonfig --initial"), and also changed /etc/gdm/Init/Default; then I rebooted the machine, however AMD GPUs are still not listed when clinfo run by remote user, and are listed if run by the same user, but through "sudo clinfo".  Any further suggestions?

0 Likes

you must export DISPLAY=:0 when you run through ssh.

0 Likes
cgorac
Journeyman III

Seems like "export DISPLAY=:0" doesn't help - same behavior as without it.  Is there any way I can create some logs, to see further what could be the problem here?

Also: since I switched to xorg.conf generated by "aticonfig --initial", I have Xorg process put into uninterruptible sleep ("D" state as shown by "ps auxw" output).  So - anyone having working xorg.conf to show as an example?

0 Likes


Are you running gdm (or) lightdm?

You should edit the conf files (for xhost +) for the correct window manager...

Try running "xhost +" manually on the console.

And then try connecting remotely and see if that works.

0 Likes

There is vanilla CentOS 6.4 setup on this machine, and gdm is run.  As mentioned above, I've changed /etc/gdm/Init/Default file, to add "xhost +" and "chmod uog+rw /dev/ati/card*" just before "exit 0" at the end of script.  I can verify that permissions for /dev/ati/card0 are 666 after this change, while these were 600 before this (it's somewhat strange that /dev/ati/card1 doesn't exist, however when clinfo run through sudo, it will list two AMD GPU devices).  I don't know how to verify the effect of "xhost +".  Finally, as mentioned in my previous message, I tried with "export DISPLAY=:0" in my SSH session before running clinfo, but this doesn't help either - clinfo is not showing AMD GPU devices unless run through sudo.

Again: any logs that I can acquire, or any other  way to further debug this issue?

0 Likes


Run xhost + manually as i suggested before... that way, you can ensure.

Also check this:

http://developer.amd.com/wordpress/media/2012/10/App_Note-Running_AMD_APP_Apps_Remotely.pdf

Best,

Bruhaspati

0 Likes

I've re-read the application note that you mentioned above several times - this is what I started from.

I just re-tried the whole procedure, after uninstalling both the driver and SDK:

I've tried first with 13.4 driver.  After installing this driver, "aticonfig --initial" reports that "No supported adapters detected".  So I removed this driver and installed 13.11 beta instead.  After installing it, "aticonfig --initial" was able to generate attached xorg.conf file.  I've restarted machine then.  As mentioned previously, CentOS 6.4 is installed on this machine, so it would run Gnome if X properly configured; /etc/gdm/Init/Default file is changed according to the application note you mentioned above.  The X log file, as found after machine re-booted is also attached.

For some reason, Xorg process get put in uninterruptible sleep state after reboot, here is output of "ps auxw | grep Xorg" (note "D" in "Ds+"):

root      4681  0.4  0.0  80220  7832 tty1     Ds+  08:13   0:00 /usr/bin/Xorg -nr -verbose -audit 4 -auth /var/run/gdm/auth-for-gdm-IzkwGB/database -nolisten tcp vt1

If I try to run now:

  export DISPLAY=:0

  xhost +

then it would just sit there, and not return the shell prompt back - I guess that happens because Xorg is asleep.  If I check /dev/ati/card0, it has 666 permissions, which means my changes in /etc/gdm/Init/Default have effect, however I don't know how to check does "xhost +" put there had any effect.

If I run clinfo then, output is as in attached clinfo-user.txt - AMD GPUs are not listed.  However, if I run it as "sudo clinfo", output is as in attached clinfo-root.txt, and AMD GPUs are listed.

I'm doing all of this because I have an OpenCL application that I have to profile.  NVIDIA dropped OpenCL profiler support some time ago, and some of alternative profilers seem not to be helpful when application run on NVIDIA GPUs (I've tried gDEBugger, and it was crashing, and then I changed my code to utilize LPTV profiler (https://code.google.com/p/ltpv/) - in this case, I was able to get some profiling results, however after additional testing it doesn't seem that what LPTV pinpointed is really cause of the slow execution times, so I'd like to try to profile with CodeXL.  Obviously, I can try running CodeXL through sudo, in the hope that it will found AMD GPUs this way; but for security and other reasons, I'd really like to try to setup the driver and whatever else needed on this machine, so that AMD GPUs could be used from OpenCL codes run by an user logged in remotely, just like it's possible to do with NVIDIA GPUs.

0 Likes

Hi cgorac,

I am not an Xpert. So, I am hoping somebody else in the forum would help you.

Your best bet is to find out from other internet forums to find out your X issues.

Once you sort it out -- I believe things should start working sane.

If not, come back to us (with X working fine). We will be happy to help you.

Sorry....

Best Regards,

Bruhaspati

0 Likes

In the meantime, I reverted my xorg.conf file to original version, that is basically not listing any screens, and removed my changes in /etc/gdm/Init/Default; after rebooting, the effect is that X is not started at all.  However, the clinfo results are unchanged: it lists 2 GPUs and a CPU device for AMD platform when run under sudo, and only CPU device when run without sudo.  Then I tried strace, in order to find what is making difference, and I came to the same conclusion as discussed in the following thread: http://devgurus.amd.com/thread/160292.  Namely, /dev/ati/card[01] devices would appear only if I first run clinfo as root, then I have to change permissions to 666 for these devices in order for clinfo to be able to proceed when run as regular user.  Final obstacle I've encountered is that "ioctl(5, 0xc0586450, 0x7fff12b14840)" is returning EACCESS when clinfo run as ordinary user.  As mentioned in the thread pointed above, using "setcap cap_sys_admin+ep /usr/bin/clinfo" is going to make it possible for clinfo to finally list AMD GPUs when run as ordinary user; but this is certainly not a general/acceptable solution.

0 Likes