9 Replies Latest reply on Jul 1, 2015 2:18 AM by dipak

    OpenCL not recognizing the 295x2 properly in my system

      techuvise

       

      Posing on behalf of a member...

       

      I have an issue similar to https://community.amd.com/thread/166307.

       

      I have the issue with OpenCL not recognizing the 295x2 properly in my system.
      $ aticonfig --lsa
      * 0. 05:00.0 Supported device 67B9
        1. 06:00.0 Supported device 67B9
        2. 83:00.0 Supported device 67B9
        3. 84:00.0 Supported device 67B9

       

      * - Default adapter

       

      clinfo | grep "Board name\|Device Topology"
        Board name:                     AMD Radeon R9 200 Series
        Device Topology:                 PCI[ B#5, D#0, F#0 ]
        Board name:                    
        Device Topology:                 PCI[ B#5, D#0, F#0 ]
        Board name:                    
        Device Topology:                 PCI[ B#5, D#0, F#0 ]
        Board name:                    

       

      My system is RHEL 6.5 on a dual Xeon ASUS Z10PE-D8 WS motherboard.

       

      As you can see aticonfig recognize 4 GPUs (0x05, 0x06, 0x83, 0x84) which are the two(2) 295x2 in the system.  The clinfo only recognize three(3) and it showed all of the same PCI bus ID (B#5).

       

      I'm having an issue with multi-GPU OpenCL and I'm hoping that somehow OpenCL is not being recognized properly (the above) issue is the cause of my Multi-GPU issue.

       

      I have a beta test coming up next week and this issue is holding me up.
      Thank you in advance for your prompt reply.

       

      Additional info.  I've tried this both fglrx-15.504 and fglrx-15.5.  OpenCL is using the v3.0 beta sdk.

       

        • Re: OpenCL not recognizing the 295x2 properly in my system

          techuvise

           

          FYI: We are looking into it, the engineers are trying to reproduce.

          • Re: OpenCL not recognizing the 295x2 properly in my system
            dipak

            @techuvise

             

            We are trying to prepare a similar multi-gpu setup as yours.  I want to make sure that we are using the same driver package. Did you download the catalyst driver from here Desktop ? If not, please try this version once and share the link of your version. Also, please provide the complete clinfo output.

             

            Another point, hope both the cards were attached during the installation of the driver. That means, you didn't change any h/w after the installation.

             

            Regards.

              • Re: OpenCL not recognizing the 295x2 properly in my system
                techuvise

                Yes.  I got the new driver (15.5) from the same location you indicated.  The name of the file downloaded is 'amd-catalyst-omega-15.5-linux-run-installers.zip'.

                The old driver file (14.12) I used was 'amd-catalyst-omega-14.12-linux-run-installers.zip'.

                 

                See attached file for the complete clinfo output.

                 

                Sometime it does show all 4 GPU devices but sometime it does not.  Regardless, it always shows 'Device Topology: PCI[ B#5, D#0, F#0 ]' for all GPU devices.  Also noticed that only the 1st GPU device shows 'Device OpenCL C version: OpenCL C 2.0' while other GPU devices shows 'Device OpenCL C version: OpenCL C 1.2'.  There are also differences between the 'Global memory size:' and 'Max memory allocation:'.

                 

                On the motherboard bios I had to enable 'Above 4G Decoding' to get it to work. Advanced->PCI Subsystem Settings->Above 4G Decoding->Enable.

                When booting up it only show 'Executing PCI Option ROM - Display Controller PCI B:05...' for the 1st 295x2.  I don't see the same message for the 2nd 295x2 at PCI B:83.  Is this normal?

                I installed 1st 295x2 card at PCI slot 1 and 2nd 295x2 card at PCI slot 3.

                 

                I used 'aticonfig -f --adapter=all --initial' to setup the xorg.conf file.

                 

                Yes.  I did attached both cards during driver installation.  No. I did not change the hw after installation.

                 

                Thank you so much for your prompt attention to this issue.

                  • Re: OpenCL not recognizing the 295x2 properly in my system
                    techuvise

                    Here is the output from lspci.

                    $ lspci |grep AMD

                    05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vesuvius

                    05:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aac8

                    06:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Vesuvius

                    83:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vesuvius

                    83:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aac8

                    84:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Vesuvius

                    • Re: OpenCL not recognizing the 295x2 properly in my system
                      dipak

                      Thanks for confirming the driver version and sharing the clinfo output. We'll try to reproduce at our end and share our observation. Meanwhile, if you want, you may also try this driver version http://support.amd.com/en-us/download/desktop?os=RHEL%20x86%2064. Not sure, if it helps.

                       

                      Sometime it does show all 4 GPU devices but sometime it does not. 

                      Do you observe any pattern when it detects or does not detect? I Mean, after any change or modification?

                       

                      Also noticed that only the 1st GPU device shows 'Device OpenCL C version: OpenCL C 2.0' while other GPU devices shows 'Device OpenCL C version: OpenCL C 1.2'.

                      Currently, on a multi-gpu platform, only one device is detected as OpenCL 2.0 supported device, though other devices may support OpenCL 2.0. This is a limitation of current drivers.

                       

                      There are also differences between the 'Global memory size:' and 'Max memory allocation:'.

                      The difference is due to the different address space support (see " Address bits" parameter in clinfo). By default, 64-bit address space is enable for OpenCL 2.0 devices.

                       

                       

                      I don't see the same message for the 2nd 295x2 at PCI B:83.  Is this normal?

                      I'm not sure about the bios setting. I need to check with some other folks.

                       

                      Regards,

                        • Re: OpenCL not recognizing the 295x2 properly in my system
                          techuvise
                          Sometime it does show all 4 GPU devices but sometime it does not. 

                          Do you observe any pattern when it detects or does not detect? I Mean, after any change or modification?

                           

                          Can not conclusively determine when it will detect all 4 GPU devices and when only 3 GPU devices.

                           

                          I did play around with 'export COMPUTE=:0'.  When I set it the GPU count goes down to 1 and it goes back up to 3 when I unset it.

                          No changes when I do the same for 'export DISPLAY=:0'.

                           

                          • Re: OpenCL not recognizing the 295x2 properly in my system
                            techuvise

                            There are also differences between the 'Global memory size:' and 'Max memory allocation:'.

                            The difference is due to the different address space support (see " Address bits" parameter in clinfo). By default, 64-bit address space is enable for OpenCL 2.0 devices.

                             

                            Thank you so much for pointing this out to me that 2+GPU will default to 32bit addressing.

                            I did an 'export GPU_FORCE_64BIT_PTR=1' to force 2+GPU to 64bits.

                             

                            As it turns out this solves my original issue from my original post:

                             

                            From my original post:

                            I'm having an issue with multi-GPU OpenCL and I'm hoping that somehow OpenCL is not being recognized properly (the above) issue is the cause of my Multi-GPU issue.

                             

                            Thanks a million on helping me to solve this problem.

                             

                            I will still need to resolve the issues that only 3GPU shows up and PCI Bus address are all the same.