cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

jtrudeau
Staff

OpenCL not recognizing the 295x2 properly in my system

techuvise

Posing on behalf of a member...

I have an issue similar to https://community.amd.com/thread/166307.

I have the issue with OpenCL not recognizing the 295x2 properly in my system.
$ aticonfig --lsa
* 0. 05:00.0 Supported device 67B9
  1. 06:00.0 Supported device 67B9
  2. 83:00.0 Supported device 67B9
  3. 84:00.0 Supported device 67B9

* - Default adapter

clinfo | grep "Board name\|Device Topology"
  Board name:                     AMD Radeon R9 200 Series
  Device Topology:                 PCI[ B#5, D#0, F#0 ]
  Board name:                    
  Device Topology:                 PCI[ B#5, D#0, F#0 ]
  Board name:                    
  Device Topology:                 PCI[ B#5, D#0, F#0 ]
  Board name:                    

My system is RHEL 6.5 on a dual Xeon ASUS Z10PE-D8 WS motherboard.

As you can see aticonfig recognize 4 GPUs (0x05, 0x06, 0x83, 0x84) which are the two(2) 295x2 in the system.  The clinfo only recognize three(3) and it showed all of the same PCI bus ID (B#5).

I'm having an issue with multi-GPU OpenCL and I'm hoping that somehow OpenCL is not being recognized properly (the above) issue is the cause of my Multi-GPU issue.

I have a beta test coming up next week and this issue is holding me up.
Thank you in advance for your prompt reply.

Additional info.  I've tried this both fglrx-15.504 and fglrx-15.5.  OpenCL is using the v3.0 beta sdk.

0 Likes
9 Replies
jtrudeau
Staff

techuvise

FYI: We are looking into it, the engineers are trying to reproduce.

0 Likes
dipak
Big Boss

@techuvise

We are trying to prepare a similar multi-gpu setup as yours.  I want to make sure that we are using the same driver package. Did you download the catalyst driver from here Desktop ? If not, please try this version once and share the link of your version. Also, please provide the complete clinfo output.

Another point, hope both the cards were attached during the installation of the driver. That means, you didn't change any h/w after the installation.

Regards.

0 Likes

Yes.  I got the new driver (15.5) from the same location you indicated.  The name of the file downloaded is 'amd-catalyst-omega-15.5-linux-run-installers.zip'.

The old driver file (14.12) I used was 'amd-catalyst-omega-14.12-linux-run-installers.zip'.

See attached file for the complete clinfo output.

Sometime it does show all 4 GPU devices but sometime it does not.  Regardless, it always shows 'Device Topology: PCI[ B#5, D#0, F#0 ]' for all GPU devices.  Also noticed that only the 1st GPU device shows 'Device OpenCL C version: OpenCL C 2.0' while other GPU devices shows 'Device OpenCL C version: OpenCL C 1.2'.  There are also differences between the 'Global memory size:' and 'Max memory allocation:'.

On the motherboard bios I had to enable 'Above 4G Decoding' to get it to work. Advanced->PCI Subsystem Settings->Above 4G Decoding->Enable.

When booting up it only show 'Executing PCI Option ROM - Display Controller PCI B:05...' for the 1st 295x2.  I don't see the same message for the 2nd 295x2 at PCI B:83.  Is this normal?

I installed 1st 295x2 card at PCI slot 1 and 2nd 295x2 card at PCI slot 3.

I used 'aticonfig -f --adapter=all --initial' to setup the xorg.conf file.

Yes.  I did attached both cards during driver installation.  No. I did not change the hw after installation.

Thank you so much for your prompt attention to this issue.

0 Likes

Here is the output from lspci.

$ lspci |grep AMD

05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vesuvius

05:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aac8

06:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Vesuvius

83:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vesuvius

83:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aac8

84:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Vesuvius

0 Likes

Thanks for confirming the driver version and sharing the clinfo output. We'll try to reproduce at our end and share our observation. Meanwhile, if you want, you may also try this driver version http://support.amd.com/en-us/download/desktop?os=RHEL%20x86%2064. Not sure, if it helps.

Sometime it does show all 4 GPU devices but sometime it does not. 

Do you observe any pattern when it detects or does not detect? I Mean, after any change or modification?

Also noticed that only the 1st GPU device shows 'Device OpenCL C version: OpenCL C 2.0' while other GPU devices shows 'Device OpenCL C version: OpenCL C 1.2'.

Currently, on a multi-gpu platform, only one device is detected as OpenCL 2.0 supported device, though other devices may support OpenCL 2.0. This is a limitation of current drivers.

There are also differences between the 'Global memory size:' and 'Max memory allocation:'.

The difference is due to the different address space support (see " Address bits" parameter in clinfo). By default, 64-bit address space is enable for OpenCL 2.0 devices.

I don't see the same message for the 2nd 295x2 at PCI B:83.  Is this normal?

I'm not sure about the bios setting. I need to check with some other folks.

Regards,

0 Likes

Sometime it does show all 4 GPU devices but sometime it does not. 

Do you observe any pattern when it detects or does not detect? I Mean, after any change or modification?

Can not conclusively determine when it will detect all 4 GPU devices and when only 3 GPU devices.

I did play around with 'export COMPUTE=:0'.  When I set it the GPU count goes down to 1 and it goes back up to 3 when I unset it.

No changes when I do the same for 'export DISPLAY=:0'.

0 Likes

There are also differences between the 'Global memory size:' and 'Max memory allocation:'.

The difference is due to the different address space support (see " Address bits" parameter in clinfo). By default, 64-bit address space is enable for OpenCL 2.0 devices.

Thank you so much for pointing this out to me that 2+GPU will default to 32bit addressing.

I did an 'export GPU_FORCE_64BIT_PTR=1' to force 2+GPU to 64bits.

As it turns out this solves my original issue from my original post:

From my original post:

I'm having an issue with multi-GPU OpenCL and I'm hoping that somehow OpenCL is not being recognized properly (the above) issue is the cause of my Multi-GPU issue.

Thanks a million on helping me to solve this problem.

I will still need to resolve the issues that only 3GPU shows up and PCI Bus address are all the same.

0 Likes

Any status on reproducing this problem on your end?

I still need the resolution to why only 3GPU show up.

Thanks

0 Likes

My apologies for this late reply.

We are able to reproduce the issue [i.e. not showing  all the GPUs when two 295x2 cards are attached]. A bug report has been filed against it and our engg. team is working on that. If I get any further update, I'll share with you.

Regards,

0 Likes