dorothy

dual GPU x8 PCIe testing

Discussion created by dorothy on May 16, 2011

 I have two GPU's configured here, each at x8 slots, and it's a little confusing to tell if the applications are seeing both.  Not certain exactly what expected bandwidth should be for an ATI GPU degraded from x16 to x8 PCIe, but it appears to be a bit low.........

 



x16 expectations:
   Host to device : 4.95602 GB/s
   Device to host : 6.0957 GB/s

x8 results:
   Host to device : 2.15459 GB/s
   Device to host : 2.35018 GB/s


[root@2-1 ~]# aticonfig --list-adapters
* 0. 05:00.0 AMD FireStream 9350
   1. 04:00.0 AMD FireStream 9350

aticonfig --adapter=all --initial

[root@2-1 ~]# cat /etc/X11/xorg.conf Section "ServerLayout"
    Identifier     "aticonfig Layout"
    Screen      0  "aticonfig-Screen[0]-0" 0 0
    Screen         "aticonfig-Screen[1]-0" RightOf "aticonfig-Screen[0]-0"
EndSection

Section "Module"
EndSection

Section "Monitor"
    Identifier   "aticonfig-Monitor[0]-0"
    Option        "VendorName" "ATI Proprietary Driver"
    Option        "ModelName" "Generic Autodetecting Monitor"
    Option        "DPMS" "true"
EndSection

Section "Monitor"
    Identifier   "aticonfig-Monitor[1]-0"
    Option        "VendorName" "ATI Proprietary Driver"
    Option        "ModelName" "Generic Autodetecting Monitor"
    Option        "DPMS" "true"
EndSection

Section "Device"
    Identifier  "aticonfig-Device[0]-0"
    Driver      "fglrx"
    BusID       "PCI:5:0:0"
EndSection

Section "Device"
    Identifier  "aticonfig-Device[1]-0"
    Driver      "fglrx"
    BusID       "PCI:4:0:0"
EndSection

Section "Screen"
    Identifier "aticonfig-Screen[0]-0"
    Device     "aticonfig-Device[0]-0"
    Monitor    "aticonfig-Monitor[0]-0"
    DefaultDepth     24
    SubSection "Display"
        Viewport   0 0
        Depth     24
    EndSubSection
EndSection

Section "Screen"
    Identifier "aticonfig-Screen[1]-0"
    Device     "aticonfig-Device[1]-0"
    Monitor    "aticonfig-Monitor[1]-0"
    DefaultDepth     24
    SubSection "Display"
        Viewport   0 0
        Depth     24
    EndSubSection
EndSection

[root@2-1 ~]#


[root@2-1 ~]# aticonfig --odgc --adapter=all

Adapter 0 - AMD FireStream 9350
                             Core (MHz)    Memory (MHz)
            Current Clocks :    157           300
              Current Peak :    700           1000
   Configurable Peak Range : [550-700]     [900-1000]
                  GPU load :    0%
ERROR - Get clocks failed for Adapter 1 - AMD FireStream 9350
[root@2-1 ~]#

[root@2-1 ~]# aticonfig --odgt --adapter=all

Adapter 0 - AMD FireStream 9350
             Sensor 0: Temperature - 32.50 C ERROR - Get temperature failed for Adapter 1 - AMD FireStream 9350
[root@2-1 ~]#


/usr/local/AEE/AMD/2.3/AMD_Stream_SDK/samples/cal/bin/x86_64


[root@2-1 x86_64]# ./FindNumDevices
Supported CAL Runtime Version: 1.3.185
Found CAL Runtime Version: 1.4.900
Use -? for help
CAL initialized.
Finding out number of devices :-
Device Count = 1
CAL shutdown successful.

/usr/local/AMD/2.3/AMD_Stream_SDK/samples/opencl/bin/x86_64


[root@2-1 x86_64]# ./CLInfo
Number of platforms:                 1
   Platform Profile:                 FULL_PROFILE
   Platform Version:                 OpenCL 1.1 ATI-Stream-v2.3 (451)
   Platform Name:                 ATI Stream
   Platform Vendor:                 Advanced Micro Devices, Inc.
   Platform Extensions:                 cl_khr_icd cl_amd_event_callback cl_amd_offline_devices


   Platform Name:                 ATI Stream
Number of devices:                 2
   Device Type:                     CL_DEVICE_TYPE_GPU
   Device ID:                     4098
   Max compute units:                 18
   Max work items dimensions:             3
     Max work items[0]:                 256
     Max work items[1]:                 256
     Max work items[2]:                 256
   Max work group size:                 256
   Preferred vector width char:             16
   Preferred vector width short:             8
   Preferred vector width int:             4
   Preferred vector width long:             2
   Preferred vector width float:             4
   Preferred vector width double:         0
   Native vector width char:             0
   Native vector width short:             0
   Native vector width int:             0
   Native vector width long:             0
   Native vector width float:             0
   Native vector width double:             0
   Max clock frequency:                 0Mhz
   Address bits:                     32
   Max memory allocation:             268435456
   Image support:                 Yes
   Max number of images read arguments:         128
   Max number of images write arguments:         8
   Max image 2D width:                 8192
   Max image 2D height:                 8192
   Max image 3D width:                 2048
   Max image 3D height:                 2048
   Max image 3D depth:                 2048
   Max samplers within kernel:             16
   Max size of kernel argument:             1024
   Alignment (bits) of base address:         32768
   Minimum alignment (bytes) for any datatype:     128
   Single precision floating point capability
     Denorms:                     No
     Quiet NaNs:                     Yes
     Round to nearest even:             Yes
     Round to zero:                 Yes
     Round to +ve and infinity:             Yes
     IEEE754-2008 fused multiply-add:         Yes
   Cache type:                     None
   Cache line size:                 0
   Cache size:                     0
   Global memory size:                 1073741824
   Constant buffer size:                 65536
   Max number of constant args:             8
   Local memory type:                 Scratchpad
   Local memory size:                 32768
   Kernel Preferred work group size multiple:     64
   Error correction support:             0
   Unified memory for Host and Device:         0
   Profiling timer resolution:             1
   Device endianess:                 Little
   Available:                     Yes
   Compiler available:                 Yes
   Execution capabilities:               
     Execute OpenCL kernels:             Yes
     Execute native function:             No
   Queue properties:               
     Out-of-Order:                 No
     Profiling :                     Yes
   Platform ID:                     0x7ffe4bb8f880
   Name:                         Cypress
   Vendor:                     Advanced Micro Devices, Inc.
   Driver version:                 CAL 1.4.900
   Profile:                     FULL_PROFILE
   Version:                     OpenCL 1.1 ATI-Stream-v2.3 (451)
   Extensions:                     cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing
cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops cl_amd_popcnt


   Device Type:                     CL_DEVICE_TYPE_CPU
   Device ID:                     4098
   Max compute units:                 6
   Max work items dimensions:             3
     Max work items[0]:                 1024
     Max work items[1]:                 1024
     Max work items[2]:                 1024
   Max work group size:                 1024
   Preferred vector width char:             16
   Preferred vector width short:             8
   Preferred vector width int:             4
   Preferred vector width long:             2
   Preferred vector width float:             4
   Preferred vector width double:         0
   Native vector width char:             16
   Native vector width short:             8
   Native vector width int:             4
   Native vector width long:             2
   Native vector width float:             4
   Native vector width double:             0
   Max clock frequency:                 1400Mhz
   Address bits:                     64
   Max memory allocation:             1073741824
   Image support:                 No
   Max size of kernel argument:             4096
   Alignment (bits) of base address:         1024
   Minimum alignment (bytes) for any datatype:     128
   Single precision floating point capability
     Denorms:                     Yes
     Quiet NaNs:                     Yes
     Round to nearest even:             Yes
     Round to zero:                 Yes
     Round to +ve and infinity:             Yes
     IEEE754-2008 fused multiply-add:         No
   Cache type:                     Read/Write
   Cache line size:                 64
   Cache size:                     65536
   Global memory size:                 3221225472
   Constant buffer size:                 65536
   Max number of constant args:             8
   Local memory type:                 Global
   Local memory size:                 32768
   Kernel Preferred work group size multiple:     1
   Error correction support:             0
   Unified memory for Host and Device:         1
   Profiling timer resolution:             1
   Device endianess:                 Little
   Available:                     Yes
   Compiler available:                 Yes
   Execution capabilities:               
     Execute OpenCL kernels:             Yes
     Execute native function:             Yes
   Queue properties:               
     Out-of-Order:                 No
     Profiling :                     Yes
   Platform ID:                     0x7ffe4bb8f880
   Name:                         AMD Opteron(tm) Processor 4184
   Vendor:                     AuthenticAMD
   Driver version:                 2.0
   Profile:                     FULL_PROFILE
   Version:                     OpenCL 1.1 ATI-Stream-v2.3 (451)
   Extensions:                     cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics
cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_media_ops cl_amd_popcnt cl_amd_printf


[root@2-1 x86_64]#

[root@2-1 x86_64]# ./PCIeBandwidth -x 10000000 -d 0

Platform Vendor : Advanced Micro Devices, Inc.
Device 0 : Cypress
Host to device : 2.16979 GB/s
Device to host : 2.39952 GB/s

[root@2-1 x86_64]# ./PCIeBandwidth -x 10000000 -d 1

Platform Vendor : Advanced Micro Devices, Inc.
Device 0 : Cypress
DeviceId should be < 1
Error: sampleCommon::validateDeviceId() failed
[root@2-1 x86_64]#

[root@2-1 x86_64]# ./PCIeBandwidth -x 10000000 -d all

Platform Vendor : Advanced Micro Devices, Inc.
Device 0 : Cypress
Host to device : 2.15459 GB/s
Device to host : 2.35018 GB/s
[root@2-1 x86_64]#

Outcomes