2 Replies Latest reply on Jun 3, 2011 4:33 AM by rollyng

    AMD ATI HD 6990 problems

    g1zm0
      multiGPU problems

      Hi all,

      I'm testing SimpleMultiDevice sample on ATI HD6990 (Ubuntu 10.10) with the following results:

      ----------------------------------------------------------
      CPU + GPU Test 1 : Single context Single Thread
      ----------------------------------------------------------
      Total time : 51
      Time of CPU : 50.7218
      Time of GPU : 2.694
      ----------------------------------------------------------
      CPU + GPU Test 2 : Multiple context Single Thread
      ----------------------------------------------------------
      Total time : 51
      Time of CPU : 53.2404
      Time of GPU : 2.25078
      ----------------------------------------------------------
      CPU + GPU Test 3 : Multiple context Multiple Thread
      ----------------------------------------------------------
      Total time : 52
      Time of CPU : 52.0495
      Time of GPU : 2.25067
      ----------------------------------------------------------
      Multi GPU Test 1 : Single context Single Thread
      ----------------------------------------------------------
      Total time : 3
      Time of GPU0 : 2.26233
      Time of GPU1 : 2.259
      ----------------------------------------------------------
      Multi GPU Test 2 : Multiple context Single Thread
      ----------------------------------------------------------
      Total time : 3
      Time of GPU0 : 2.262
      Time of GPU1 : 2.25044
      ----------------------------------------------------------
      Multi GPU Test 3 : Multiple context Multiple Thread
      ----------------------------------------------------------
      Total time : 3
      Time of GPU0 : 2.25078
      Time of GPU1 : 2.25267

      and the clinfo result is:

      Number of platforms:                 1
        Platform Profile:                 FULL_PROFILE
        Platform Version:                 OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
        Platform Name:                 AMD Accelerated Parallel Processing
        Platform Vendor:                 Advanced Micro Devices, Inc.
        Platform Extensions:                 cl_khr_icd cl_amd_event_callback cl_amd_offline_devices


        Platform Name:                 AMD Accelerated Parallel Processing
      Number of devices:                 3
        Device Type:                     CL_DEVICE_TYPE_GPU
        Device ID:                     4098
        Max compute units:                 24
        Max work items dimensions:             3
          Max work items[0]:                 256
          Max work items[1]:                 256
          Max work items[2]:                 256
        Max work group size:                 256
        Preferred vector width char:             16
        Preferred vector width short:             8
        Preferred vector width int:             4
        Preferred vector width long:             2
        Preferred vector width float:             4
        Preferred vector width double:         0
        Native vector width char:             16
        Native vector width short:             8
        Native vector width int:             4
        Native vector width long:             2
        Native vector width float:             4
        Native vector width double:             0
        Max clock frequency:                 830Mhz
        Address bits:                     32
        Max memory allocation:             268435456
        Image support:                 Yes
        Max number of images read arguments:         128
        Max number of images write arguments:         8
        Max image 2D width:                 8192
        Max image 2D height:                 8192
        Max image 3D width:                 2048
        Max image 3D height:                 2048
        Max image 3D depth:                 2048
        Max samplers within kernel:             16
        Max size of kernel argument:             1024
        Alignment (bits) of base address:         32768
        Minimum alignment (bytes) for any datatype:     128
        Single precision floating point capability
          Denorms:                     No
          Quiet NaNs:                     Yes
          Round to nearest even:             Yes
          Round to zero:                 Yes
          Round to +ve and infinity:             Yes
          IEEE754-2008 fused multiply-add:         Yes
        Cache type:                     None
        Cache line size:                 0
        Cache size:                     0
        Global memory size:                 1073741824
        Constant buffer size:                 65536
        Max number of constant args:             8
        Local memory type:                 Scratchpad
        Local memory size:                 32768
        Kernel Preferred work group size multiple:     64
        Error correction support:             0
        Unified memory for Host and Device:         0
        Profiling timer resolution:             1
        Device endianess:                 Little
        Available:                     Yes
        Compiler available:                 Yes
        Execution capabilities:                 
          Execute OpenCL kernels:             Yes
          Execute native function:             No
        Queue properties:                 
          Out-of-Order:                 No
          Profiling :                     Yes
        Platform ID:                     0x7f0c165d3800
        Name:                         Cayman
        Vendor:                     Advanced Micro Devices, Inc.
        Driver version:                 CAL 1.4.1385
        Profile:                     FULL_PROFILE
        Version:                     OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
        Extensions:                     cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops cl_amd_popcnt


        Device Type:                     CL_DEVICE_TYPE_GPU
        Device ID:                     4098
        Max compute units:                 24
        Max work items dimensions:             3
          Max work items[0]:                 256
          Max work items[1]:                 256
          Max work items[2]:                 256
        Max work group size:                 256
        Preferred vector width char:             16
        Preferred vector width short:             8
        Preferred vector width int:             4
        Preferred vector width long:             2
        Preferred vector width float:             4
        Preferred vector width double:         0
        Native vector width char:             16
        Native vector width short:             8
        Native vector width int:             4
        Native vector width long:             2
        Native vector width float:             4
        Native vector width double:             0
        Max clock frequency:                 0Mhz
        Address bits:                     32
        Max memory allocation:             268435456
        Image support:                 Yes
        Max number of images read arguments:         128
        Max number of images write arguments:         8
        Max image 2D width:                 8192
        Max image 2D height:                 8192
        Max image 3D width:                 2048
        Max image 3D height:                 2048
        Max image 3D depth:                 2048
        Max samplers within kernel:             16
        Max size of kernel argument:             1024
        Alignment (bits) of base address:         32768
        Minimum alignment (bytes) for any datatype:     128
        Single precision floating point capability
          Denorms:                     No
          Quiet NaNs:                     Yes
          Round to nearest even:             Yes
          Round to zero:                 Yes
          Round to +ve and infinity:             Yes
          IEEE754-2008 fused multiply-add:         Yes
        Cache type:                     None
        Cache line size:                 0
        Cache size:                     0
        Global memory size:                 1073741824
        Constant buffer size:                 65536
        Max number of constant args:             8
        Local memory type:                 Scratchpad
        Local memory size:                 32768
        Kernel Preferred work group size multiple:     64
        Error correction support:             0
        Unified memory for Host and Device:         0
        Profiling timer resolution:             1
        Device endianess:                 Little
        Available:                     Yes
        Compiler available:                 Yes
        Execution capabilities:                 
          Execute OpenCL kernels:             Yes
          Execute native function:             No
        Queue properties:                 
          Out-of-Order:                 No
          Profiling :                     Yes
        Platform ID:                     0x7f0c165d3800
        Name:                         Cayman
        Vendor:                     Advanced Micro Devices, Inc.
        Driver version:                 CAL 1.4.1385
        Profile:                     FULL_PROFILE
        Version:                     OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
        Extensions:                     cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops cl_amd_popcnt


        Device Type:                     CL_DEVICE_TYPE_CPU
        Device ID:                     4098
        Max compute units:                 4
        Max work items dimensions:             3
          Max work items[0]:                 1024
          Max work items[1]:                 1024
          Max work items[2]:                 1024
        Max work group size:                 1024
        Preferred vector width char:             16
        Preferred vector width short:             8
        Preferred vector width int:             4
        Preferred vector width long:             2
        Preferred vector width float:             4
        Preferred vector width double:         0
        Native vector width char:             16
        Native vector width short:             8
        Native vector width int:             4
        Native vector width long:             2
        Native vector width float:             4
        Native vector width double:             0
        Max clock frequency:                 1998Mhz
        Address bits:                     64
        Max memory allocation:             2147483648
        Image support:                 Yes
        Max number of images read arguments:         128
        Max number of images write arguments:         8
        Max image 2D width:                 8192
        Max image 2D height:                 8192
        Max image 3D width:                 2048
        Max image 3D height:                 2048
        Max image 3D depth:                 2048
        Max samplers within kernel:             16
        Max size of kernel argument:             4096
        Alignment (bits) of base address:         1024
        Minimum alignment (bytes) for any datatype:     128
        Single precision floating point capability
          Denorms:                     Yes
          Quiet NaNs:                     Yes
          Round to nearest even:             Yes
          Round to zero:                 Yes
          Round to +ve and infinity:             Yes
          IEEE754-2008 fused multiply-add:         No
        Cache type:                     Read/Write
        Cache line size:                 64
        Cache size:                     32768
        Global memory size:                 4156026880
        Constant buffer size:                 65536
        Max number of constant args:             8
        Local memory type:                 Global
        Local memory size:                 32768
        Kernel Preferred work group size multiple:     1
        Error correction support:             0
        Unified memory for Host and Device:         1
        Profiling timer resolution:             1
        Device endianess:                 Little
        Available:                     Yes
        Compiler available:                 Yes
        Execution capabilities:                 
          Execute OpenCL kernels:             Yes
          Execute native function:             Yes
        Queue properties:                 
          Out-of-Order:                 No
          Profiling :                     Yes
        Platform ID:                     0x7f0c165d3800
        Name:                         Intel(R) Core(TM)2 Quad  CPU   Q9450  @ 2.66GHz
        Vendor:                     GenuineIntel
        Driver version:                 2.0
        Profile:                     FULL_PROFILE
        Version:                     OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
        Extensions:                     cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_media_ops cl_amd_popcnt cl_amd_printf

      This results are very strange..., at least for me a rookie of OpenCL , and I don't know whether this specific GPU cannot execute two kernel smiultaneously or not.

       

      Thanks in advanced!

       

        • AMD ATI HD 6990 problems
          himanshu.gautam

          Thanks for reporting it.

          • AMD ATI HD 6990 problems
            rollyng

             

            Originally posted by: g1zm0 Hi all,

             

            I'm testing SimpleMultiDevice sample on ATI HD6990 (Ubuntu 10.10) with the following results:

             

            ---------------------------------------------------------- CPU + GPU Test 1 : Single context Single Thread ---------------------------------------------------------- Total time : 51 Time of CPU : 50.7218 Time of GPU : 2.694 ---------------------------------------------------------- CPU + GPU Test 2 : Multiple context Single Thread ---------------------------------------------------------- Total time : 51 Time of CPU : 53.2404 Time of GPU : 2.25078 ---------------------------------------------------------- CPU + GPU Test 3 : Multiple context Multiple Thread ---------------------------------------------------------- Total time : 52 Time of CPU : 52.0495 Time of GPU : 2.25067 ---------------------------------------------------------- Multi GPU Test 1 : Single context Single Thread ---------------------------------------------------------- Total time : 3 Time of GPU0 : 2.26233 Time of GPU1 : 2.259 ---------------------------------------------------------- Multi GPU Test 2 : Multiple context Single Thread ---------------------------------------------------------- Total time : 3 Time of GPU0 : 2.262 Time of GPU1 : 2.25044 ---------------------------------------------------------- Multi GPU Test 3 : Multiple context Multiple Thread ---------------------------------------------------------- Total time : 3 Time of GPU0 : 2.25078 Time of GPU1 : 2.25267

             

            ...

             

            This results are very strange..., at least for me a rookie of OpenCL , and I don't know whether this specific GPU cannot execute two kernel smiultaneously or not.

             

             

            Thanks in advanced!

             

             

            Hi, may be I'm missing something, I don't quite get what is being wrong with your result?

            Here is my output:

            ----------------------------------------------------------
            CPU + GPU Test 1 : Single context Single Thread
            ----------------------------------------------------------
            Total time : 29
            Time of CPU : 26.7108
            Time of GPU : 2.53355
            ----------------------------------------------------------
            CPU + GPU Test 2 : Multiple context Single Thread
            ----------------------------------------------------------
            Total time : 29
            Time of CPU : 26.8218
            Time of GPU : 2.25389
            ----------------------------------------------------------
            CPU + GPU Test 3 : Multiple context Multiple Thread
            ----------------------------------------------------------
            Total time : 29
            Time of CPU : 28.5875
            Time of GPU : 2.25489
            ----------------------------------------------------------
            Multi GPU Test 1 : Single context Single Thread
            ----------------------------------------------------------
            Total time : 23
            Time of GPU0 : 2.25433
            Time of GPU1 : 2.25067
            Time of GPU2 : 2.26367
            Time of GPU3 : 2.264
            Time of GPU4 : 2.254
            Time of GPU5 : 2.25478
            Time of GPU6 : 2.258
            Time of GPU7 : 2.25844
            ----------------------------------------------------------
            Multi GPU Test 2 : Multiple context Single Thread
            ----------------------------------------------------------
            Total time : 27
            Time of GPU0 : 2.25689
            Time of GPU1 : 2.25222
            Time of GPU2 : 2.26367
            Time of GPU3 : 2.25222
            Time of GPU4 : 2.25444
            Time of GPU5 : 2.25611
            Time of GPU6 : 2.265
            Time of GPU7 : 2.26222
            ----------------------------------------------------------
            Multi GPU Test 3 : Multiple context Multiple Thread
            ----------------------------------------------------------
            Total time : 21
            Time of GPU0 : 2.25333
            Time of GPU1 : 2.25211
            Time of GPU2 : 2.25444
            Time of GPU3 : 2.25344
            Time of GPU4 : 2.25411
            Time of GPU5 : 2.26578
            Time of GPU6 : 2.28556
            Time of GPU7 : 2.25478