Posing on behalf of a member...
FYI: We are looking into it, the engineers are trying to reproduce.
We are trying to prepare a similar multi-gpu setup as yours. I want to make sure that we are using the same driver package. Did you download the catalyst driver from here Desktop ? If not, please try this version once and share the link of your version. Also, please provide the complete clinfo output.
Another point, hope both the cards were attached during the installation of the driver. That means, you didn't change any h/w after the installation.
Regards.
Yes. I got the new driver (15.5) from the same location you indicated. The name of the file downloaded is 'amd-catalyst-omega-15.5-linux-run-installers.zip'.
The old driver file (14.12) I used was 'amd-catalyst-omega-14.12-linux-run-installers.zip'.
See attached file for the complete clinfo output.
Sometime it does show all 4 GPU devices but sometime it does not. Regardless, it always shows 'Device Topology: PCI[ B#5, D#0, F#0 ]' for all GPU devices. Also noticed that only the 1st GPU device shows 'Device OpenCL C version: OpenCL C 2.0' while other GPU devices shows 'Device OpenCL C version: OpenCL C 1.2'. There are also differences between the 'Global memory size:' and 'Max memory allocation:'.
On the motherboard bios I had to enable 'Above 4G Decoding' to get it to work. Advanced->PCI Subsystem Settings->Above 4G Decoding->Enable.
When booting up it only show 'Executing PCI Option ROM - Display Controller PCI B:05...' for the 1st 295x2. I don't see the same message for the 2nd 295x2 at PCI B:83. Is this normal?
I installed 1st 295x2 card at PCI slot 1 and 2nd 295x2 card at PCI slot 3.
I used 'aticonfig -f --adapter=all --initial' to setup the xorg.conf file.
Yes. I did attached both cards during driver installation. No. I did not change the hw after installation.
Thank you so much for your prompt attention to this issue.
Here is the output from lspci.
$ lspci |grep AMD
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vesuvius
05:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aac8
06:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Vesuvius
83:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vesuvius
83:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aac8
84:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Vesuvius
Thanks for confirming the driver version and sharing the clinfo output. We'll try to reproduce at our end and share our observation. Meanwhile, if you want, you may also try this driver version http://support.amd.com/en-us/download/desktop?os=RHEL%20x86%2064. Not sure, if it helps.
Sometime it does show all 4 GPU devices but sometime it does not.
Do you observe any pattern when it detects or does not detect? I Mean, after any change or modification?
Also noticed that only the 1st GPU device shows 'Device OpenCL C version: OpenCL C 2.0' while other GPU devices shows 'Device OpenCL C version: OpenCL C 1.2'.
Currently, on a multi-gpu platform, only one device is detected as OpenCL 2.0 supported device, though other devices may support OpenCL 2.0. This is a limitation of current drivers.
There are also differences between the 'Global memory size:' and 'Max memory allocation:'.
The difference is due to the different address space support (see " Address bits" parameter in clinfo). By default, 64-bit address space is enable for OpenCL 2.0 devices.
I don't see the same message for the 2nd 295x2 at PCI B:83. Is this normal?
I'm not sure about the bios setting. I need to check with some other folks.
Regards,
Sometime it does show all 4 GPU devices but sometime it does not.
Do you observe any pattern when it detects or does not detect? I Mean, after any change or modification?
Can not conclusively determine when it will detect all 4 GPU devices and when only 3 GPU devices.
I did play around with 'export COMPUTE=:0'. When I set it the GPU count goes down to 1 and it goes back up to 3 when I unset it.
No changes when I do the same for 'export DISPLAY=:0'.
There are also differences between the 'Global memory size:' and 'Max memory allocation:'.
The difference is due to the different address space support (see " Address bits" parameter in clinfo). By default, 64-bit address space is enable for OpenCL 2.0 devices.
Thank you so much for pointing this out to me that 2+GPU will default to 32bit addressing.
I did an 'export GPU_FORCE_64BIT_PTR=1' to force 2+GPU to 64bits.
As it turns out this solves my original issue from my original post:
From my original post:
I'm having an issue with multi-GPU OpenCL and I'm hoping that somehow OpenCL is not being recognized properly (the above) issue is the cause of my Multi-GPU issue.
Thanks a million on helping me to solve this problem.
I will still need to resolve the issues that only 3GPU shows up and PCI Bus address are all the same.
Any status on reproducing this problem on your end?
I still need the resolution to why only 3GPU show up.
Thanks
My apologies for this late reply.
We are able to reproduce the issue [i.e. not showing all the GPUs when two 295x2 cards are attached]. A bug report has been filed against it and our engg. team is working on that. If I get any further update, I'll share with you.
Regards,