0 Replies Latest reply on May 16, 2011 3:24 AM by dorothy

    dual GPU x8 PCIe testing

    dorothy

       I have two GPU's configured here, each at x8 slots, and it's a little confusing to tell if the applications are seeing both.  Not certain exactly what expected bandwidth should be for an ATI GPU degraded from x16 to x8 PCIe, but it appears to be a bit low.........

       



      x16 expectations:
         Host to device : 4.95602 GB/s
         Device to host : 6.0957 GB/s

      x8 results:
         Host to device : 2.15459 GB/s
         Device to host : 2.35018 GB/s


      [root@2-1 ~]# aticonfig --list-adapters
      * 0. 05:00.0 AMD FireStream 9350
         1. 04:00.0 AMD FireStream 9350

      aticonfig --adapter=all --initial

      [root@2-1 ~]# cat /etc/X11/xorg.conf Section "ServerLayout"
          Identifier     "aticonfig Layout"
          Screen      0  "aticonfig-Screen[0]-0" 0 0
          Screen         "aticonfig-Screen[1]-0" RightOf "aticonfig-Screen[0]-0"
      EndSection

      Section "Module"
      EndSection

      Section "Monitor"
          Identifier   "aticonfig-Monitor[0]-0"
          Option        "VendorName" "ATI Proprietary Driver"
          Option        "ModelName" "Generic Autodetecting Monitor"
          Option        "DPMS" "true"
      EndSection

      Section "Monitor"
          Identifier   "aticonfig-Monitor[1]-0"
          Option        "VendorName" "ATI Proprietary Driver"
          Option        "ModelName" "Generic Autodetecting Monitor"
          Option        "DPMS" "true"
      EndSection

      Section "Device"
          Identifier  "aticonfig-Device[0]-0"
          Driver      "fglrx"
          BusID       "PCI:5:0:0"
      EndSection

      Section "Device"
          Identifier  "aticonfig-Device[1]-0"
          Driver      "fglrx"
          BusID       "PCI:4:0:0"
      EndSection

      Section "Screen"
          Identifier "aticonfig-Screen[0]-0"
          Device     "aticonfig-Device[0]-0"
          Monitor    "aticonfig-Monitor[0]-0"
          DefaultDepth     24
          SubSection "Display"
              Viewport   0 0
              Depth     24
          EndSubSection
      EndSection

      Section "Screen"
          Identifier "aticonfig-Screen[1]-0"
          Device     "aticonfig-Device[1]-0"
          Monitor    "aticonfig-Monitor[1]-0"
          DefaultDepth     24
          SubSection "Display"
              Viewport   0 0
              Depth     24
          EndSubSection
      EndSection

      [root@2-1 ~]#


      [root@2-1 ~]# aticonfig --odgc --adapter=all

      Adapter 0 - AMD FireStream 9350
                                   Core (MHz)    Memory (MHz)
                  Current Clocks :    157           300
                    Current Peak :    700           1000
         Configurable Peak Range : [550-700]     [900-1000]
                        GPU load :    0%
      ERROR - Get clocks failed for Adapter 1 - AMD FireStream 9350
      [root@2-1 ~]#

      [root@2-1 ~]# aticonfig --odgt --adapter=all

      Adapter 0 - AMD FireStream 9350
                   Sensor 0: Temperature - 32.50 C ERROR - Get temperature failed for Adapter 1 - AMD FireStream 9350
      [root@2-1 ~]#


      /usr/local/AEE/AMD/2.3/AMD_Stream_SDK/samples/cal/bin/x86_64


      [root@2-1 x86_64]# ./FindNumDevices
      Supported CAL Runtime Version: 1.3.185
      Found CAL Runtime Version: 1.4.900
      Use -? for help
      CAL initialized.
      Finding out number of devices :-
      Device Count = 1
      CAL shutdown successful.

      /usr/local/AMD/2.3/AMD_Stream_SDK/samples/opencl/bin/x86_64


      [root@2-1 x86_64]# ./CLInfo
      Number of platforms:                 1
         Platform Profile:                 FULL_PROFILE
         Platform Version:                 OpenCL 1.1 ATI-Stream-v2.3 (451)
         Platform Name:                 ATI Stream
         Platform Vendor:                 Advanced Micro Devices, Inc.
         Platform Extensions:                 cl_khr_icd cl_amd_event_callback cl_amd_offline_devices


         Platform Name:                 ATI Stream
      Number of devices:                 2
         Device Type:                     CL_DEVICE_TYPE_GPU
         Device ID:                     4098
         Max compute units:                 18
         Max work items dimensions:             3
           Max work items[0]:                 256
           Max work items[1]:                 256
           Max work items[2]:                 256
         Max work group size:                 256
         Preferred vector width char:             16
         Preferred vector width short:             8
         Preferred vector width int:             4
         Preferred vector width long:             2
         Preferred vector width float:             4
         Preferred vector width double:         0
         Native vector width char:             0
         Native vector width short:             0
         Native vector width int:             0
         Native vector width long:             0
         Native vector width float:             0
         Native vector width double:             0
         Max clock frequency:                 0Mhz
         Address bits:                     32
         Max memory allocation:             268435456
         Image support:                 Yes
         Max number of images read arguments:         128
         Max number of images write arguments:         8
         Max image 2D width:                 8192
         Max image 2D height:                 8192
         Max image 3D width:                 2048
         Max image 3D height:                 2048
         Max image 3D depth:                 2048
         Max samplers within kernel:             16
         Max size of kernel argument:             1024
         Alignment (bits) of base address:         32768
         Minimum alignment (bytes) for any datatype:     128
         Single precision floating point capability
           Denorms:                     No
           Quiet NaNs:                     Yes
           Round to nearest even:             Yes
           Round to zero:                 Yes
           Round to +ve and infinity:             Yes
           IEEE754-2008 fused multiply-add:         Yes
         Cache type:                     None
         Cache line size:                 0
         Cache size:                     0
         Global memory size:                 1073741824
         Constant buffer size:                 65536
         Max number of constant args:             8
         Local memory type:                 Scratchpad
         Local memory size:                 32768
         Kernel Preferred work group size multiple:     64
         Error correction support:             0
         Unified memory for Host and Device:         0
         Profiling timer resolution:             1
         Device endianess:                 Little
         Available:                     Yes
         Compiler available:                 Yes
         Execution capabilities:               
           Execute OpenCL kernels:             Yes
           Execute native function:             No
         Queue properties:               
           Out-of-Order:                 No
           Profiling :                     Yes
         Platform ID:                     0x7ffe4bb8f880
         Name:                         Cypress
         Vendor:                     Advanced Micro Devices, Inc.
         Driver version:                 CAL 1.4.900
         Profile:                     FULL_PROFILE
         Version:                     OpenCL 1.1 ATI-Stream-v2.3 (451)
         Extensions:                     cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing
      cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops cl_amd_popcnt


         Device Type:                     CL_DEVICE_TYPE_CPU
         Device ID:                     4098
         Max compute units:                 6
         Max work items dimensions:             3
           Max work items[0]:                 1024
           Max work items[1]:                 1024
           Max work items[2]:                 1024
         Max work group size:                 1024
         Preferred vector width char:             16
         Preferred vector width short:             8
         Preferred vector width int:             4
         Preferred vector width long:             2
         Preferred vector width float:             4
         Preferred vector width double:         0
         Native vector width char:             16
         Native vector width short:             8
         Native vector width int:             4
         Native vector width long:             2
         Native vector width float:             4
         Native vector width double:             0
         Max clock frequency:                 1400Mhz
         Address bits:                     64
         Max memory allocation:             1073741824
         Image support:                 No
         Max size of kernel argument:             4096
         Alignment (bits) of base address:         1024
         Minimum alignment (bytes) for any datatype:     128
         Single precision floating point capability
           Denorms:                     Yes
           Quiet NaNs:                     Yes
           Round to nearest even:             Yes
           Round to zero:                 Yes
           Round to +ve and infinity:             Yes
           IEEE754-2008 fused multiply-add:         No
         Cache type:                     Read/Write
         Cache line size:                 64
         Cache size:                     65536
         Global memory size:                 3221225472
         Constant buffer size:                 65536
         Max number of constant args:             8
         Local memory type:                 Global
         Local memory size:                 32768
         Kernel Preferred work group size multiple:     1
         Error correction support:             0
         Unified memory for Host and Device:         1
         Profiling timer resolution:             1
         Device endianess:                 Little
         Available:                     Yes
         Compiler available:                 Yes
         Execution capabilities:               
           Execute OpenCL kernels:             Yes
           Execute native function:             Yes
         Queue properties:               
           Out-of-Order:                 No
           Profiling :                     Yes
         Platform ID:                     0x7ffe4bb8f880
         Name:                         AMD Opteron(tm) Processor 4184
         Vendor:                     AuthenticAMD
         Driver version:                 2.0
         Profile:                     FULL_PROFILE
         Version:                     OpenCL 1.1 ATI-Stream-v2.3 (451)
         Extensions:                     cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics
      cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_media_ops cl_amd_popcnt cl_amd_printf


      [root@2-1 x86_64]#

      [root@2-1 x86_64]# ./PCIeBandwidth -x 10000000 -d 0

      Platform Vendor : Advanced Micro Devices, Inc.
      Device 0 : Cypress
      Host to device : 2.16979 GB/s
      Device to host : 2.39952 GB/s

      [root@2-1 x86_64]# ./PCIeBandwidth -x 10000000 -d 1

      Platform Vendor : Advanced Micro Devices, Inc.
      Device 0 : Cypress
      DeviceId should be < 1
      Error: sampleCommon::validateDeviceId() failed
      [root@2-1 x86_64]#

      [root@2-1 x86_64]# ./PCIeBandwidth -x 10000000 -d all

      Platform Vendor : Advanced Micro Devices, Inc.
      Device 0 : Cypress
      Host to device : 2.15459 GB/s
      Device to host : 2.35018 GB/s
      [root@2-1 x86_64]#