cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

g1zm0
Journeyman III

AMD ATI HD 6990 problems

multiGPU problems

Hi all,

I'm testing SimpleMultiDevice sample on ATI HD6990 (Ubuntu 10.10) with the following results:

----------------------------------------------------------
CPU + GPU Test 1 : Single context Single Thread
----------------------------------------------------------
Total time : 51
Time of CPU : 50.7218
Time of GPU : 2.694
----------------------------------------------------------
CPU + GPU Test 2 : Multiple context Single Thread
----------------------------------------------------------
Total time : 51
Time of CPU : 53.2404
Time of GPU : 2.25078
----------------------------------------------------------
CPU + GPU Test 3 : Multiple context Multiple Thread
----------------------------------------------------------
Total time : 52
Time of CPU : 52.0495
Time of GPU : 2.25067
----------------------------------------------------------
Multi GPU Test 1 : Single context Single Thread
----------------------------------------------------------
Total time : 3
Time of GPU0 : 2.26233
Time of GPU1 : 2.259
----------------------------------------------------------
Multi GPU Test 2 : Multiple context Single Thread
----------------------------------------------------------
Total time : 3
Time of GPU0 : 2.262
Time of GPU1 : 2.25044
----------------------------------------------------------
Multi GPU Test 3 : Multiple context Multiple Thread
----------------------------------------------------------
Total time : 3
Time of GPU0 : 2.25078
Time of GPU1 : 2.25267

and the clinfo result is:

Number of platforms:                 1
  Platform Profile:                 FULL_PROFILE
  Platform Version:                 OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
  Platform Name:                 AMD Accelerated Parallel Processing
  Platform Vendor:                 Advanced Micro Devices, Inc.
  Platform Extensions:                 cl_khr_icd cl_amd_event_callback cl_amd_offline_devices


  Platform Name:                 AMD Accelerated Parallel Processing
Number of devices:                 3
  Device Type:                     CL_DEVICE_TYPE_GPU
  Device ID:                     4098
  Max compute units:                 24
  Max work items dimensions:             3
    Max work items[0]:                 256
    Max work items[1]:                 256
    Max work items[2]:                 256
  Max work group size:                 256
  Preferred vector width char:             16
  Preferred vector width short:             8
  Preferred vector width int:             4
  Preferred vector width long:             2
  Preferred vector width float:             4
  Preferred vector width double:         0
  Native vector width char:             16
  Native vector width short:             8
  Native vector width int:             4
  Native vector width long:             2
  Native vector width float:             4
  Native vector width double:             0
  Max clock frequency:                 830Mhz
  Address bits:                     32
  Max memory allocation:             268435456
  Image support:                 Yes
  Max number of images read arguments:         128
  Max number of images write arguments:         8
  Max image 2D width:                 8192
  Max image 2D height:                 8192
  Max image 3D width:                 2048
  Max image 3D height:                 2048
  Max image 3D depth:                 2048
  Max samplers within kernel:             16
  Max size of kernel argument:             1024
  Alignment (bits) of base address:         32768
  Minimum alignment (bytes) for any datatype:     128
  Single precision floating point capability
    Denorms:                     No
    Quiet NaNs:                     Yes
    Round to nearest even:             Yes
    Round to zero:                 Yes
    Round to +ve and infinity:             Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                     None
  Cache line size:                 0
  Cache size:                     0
  Global memory size:                 1073741824
  Constant buffer size:                 65536
  Max number of constant args:             8
  Local memory type:                 Scratchpad
  Local memory size:                 32768
  Kernel Preferred work group size multiple:     64
  Error correction support:             0
  Unified memory for Host and Device:         0
  Profiling timer resolution:             1
  Device endianess:                 Little
  Available:                     Yes
  Compiler available:                 Yes
  Execution capabilities:                 
    Execute OpenCL kernels:             Yes
    Execute native function:             No
  Queue properties:                 
    Out-of-Order:                 No
    Profiling :                     Yes
  Platform ID:                     0x7f0c165d3800
  Name:                         Cayman
  Vendor:                     Advanced Micro Devices, Inc.
  Driver version:                 CAL 1.4.1385
  Profile:                     FULL_PROFILE
  Version:                     OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
  Extensions:                     cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops cl_amd_popcnt


  Device Type:                     CL_DEVICE_TYPE_GPU
  Device ID:                     4098
  Max compute units:                 24
  Max work items dimensions:             3
    Max work items[0]:                 256
    Max work items[1]:                 256
    Max work items[2]:                 256
  Max work group size:                 256
  Preferred vector width char:             16
  Preferred vector width short:             8
  Preferred vector width int:             4
  Preferred vector width long:             2
  Preferred vector width float:             4
  Preferred vector width double:         0
  Native vector width char:             16
  Native vector width short:             8
  Native vector width int:             4
  Native vector width long:             2
  Native vector width float:             4
  Native vector width double:             0
  Max clock frequency:                 0Mhz
  Address bits:                     32
  Max memory allocation:             268435456
  Image support:                 Yes
  Max number of images read arguments:         128
  Max number of images write arguments:         8
  Max image 2D width:                 8192
  Max image 2D height:                 8192
  Max image 3D width:                 2048
  Max image 3D height:                 2048
  Max image 3D depth:                 2048
  Max samplers within kernel:             16
  Max size of kernel argument:             1024
  Alignment (bits) of base address:         32768
  Minimum alignment (bytes) for any datatype:     128
  Single precision floating point capability
    Denorms:                     No
    Quiet NaNs:                     Yes
    Round to nearest even:             Yes
    Round to zero:                 Yes
    Round to +ve and infinity:             Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                     None
  Cache line size:                 0
  Cache size:                     0
  Global memory size:                 1073741824
  Constant buffer size:                 65536
  Max number of constant args:             8
  Local memory type:                 Scratchpad
  Local memory size:                 32768
  Kernel Preferred work group size multiple:     64
  Error correction support:             0
  Unified memory for Host and Device:         0
  Profiling timer resolution:             1
  Device endianess:                 Little
  Available:                     Yes
  Compiler available:                 Yes
  Execution capabilities:                 
    Execute OpenCL kernels:             Yes
    Execute native function:             No
  Queue properties:                 
    Out-of-Order:                 No
    Profiling :                     Yes
  Platform ID:                     0x7f0c165d3800
  Name:                         Cayman
  Vendor:                     Advanced Micro Devices, Inc.
  Driver version:                 CAL 1.4.1385
  Profile:                     FULL_PROFILE
  Version:                     OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
  Extensions:                     cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops cl_amd_popcnt


  Device Type:                     CL_DEVICE_TYPE_CPU
  Device ID:                     4098
  Max compute units:                 4
  Max work items dimensions:             3
    Max work items[0]:                 1024
    Max work items[1]:                 1024
    Max work items[2]:                 1024
  Max work group size:                 1024
  Preferred vector width char:             16
  Preferred vector width short:             8
  Preferred vector width int:             4
  Preferred vector width long:             2
  Preferred vector width float:             4
  Preferred vector width double:         0
  Native vector width char:             16
  Native vector width short:             8
  Native vector width int:             4
  Native vector width long:             2
  Native vector width float:             4
  Native vector width double:             0
  Max clock frequency:                 1998Mhz
  Address bits:                     64
  Max memory allocation:             2147483648
  Image support:                 Yes
  Max number of images read arguments:         128
  Max number of images write arguments:         8
  Max image 2D width:                 8192
  Max image 2D height:                 8192
  Max image 3D width:                 2048
  Max image 3D height:                 2048
  Max image 3D depth:                 2048
  Max samplers within kernel:             16
  Max size of kernel argument:             4096
  Alignment (bits) of base address:         1024
  Minimum alignment (bytes) for any datatype:     128
  Single precision floating point capability
    Denorms:                     Yes
    Quiet NaNs:                     Yes
    Round to nearest even:             Yes
    Round to zero:                 Yes
    Round to +ve and infinity:             Yes
    IEEE754-2008 fused multiply-add:         No
  Cache type:                     Read/Write
  Cache line size:                 64
  Cache size:                     32768
  Global memory size:                 4156026880
  Constant buffer size:                 65536
  Max number of constant args:             8
  Local memory type:                 Global
  Local memory size:                 32768
  Kernel Preferred work group size multiple:     1
  Error correction support:             0
  Unified memory for Host and Device:         1
  Profiling timer resolution:             1
  Device endianess:                 Little
  Available:                     Yes
  Compiler available:                 Yes
  Execution capabilities:                 
    Execute OpenCL kernels:             Yes
    Execute native function:             Yes
  Queue properties:                 
    Out-of-Order:                 No
    Profiling :                     Yes
  Platform ID:                     0x7f0c165d3800
  Name:                         Intel(R) Core(TM)2 Quad  CPU   Q9450  @ 2.66GHz
  Vendor:                     GenuineIntel
  Driver version:                 2.0
  Profile:                     FULL_PROFILE
  Version:                     OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
  Extensions:                     cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_media_ops cl_amd_popcnt cl_amd_printf

This results are very strange..., at least for me a rookie of OpenCL , and I don't know whether this specific GPU cannot execute two kernel smiultaneously or not.

 

Thanks in advanced!

 

0 Likes
2 Replies
himanshu_gautam
Grandmaster

Thanks for reporting it.

0 Likes
rollyng
Journeyman III

Originally posted by: g1zm0 Hi all,

 

I'm testing SimpleMultiDevice sample on ATI HD6990 (Ubuntu 10.10) with the following results:

 

---------------------------------------------------------- CPU + GPU Test 1 : Single context Single Thread ---------------------------------------------------------- Total time : 51 Time of CPU : 50.7218 Time of GPU : 2.694 ---------------------------------------------------------- CPU + GPU Test 2 : Multiple context Single Thread ---------------------------------------------------------- Total time : 51 Time of CPU : 53.2404 Time of GPU : 2.25078 ---------------------------------------------------------- CPU + GPU Test 3 : Multiple context Multiple Thread ---------------------------------------------------------- Total time : 52 Time of CPU : 52.0495 Time of GPU : 2.25067 ---------------------------------------------------------- Multi GPU Test 1 : Single context Single Thread ---------------------------------------------------------- Total time : 3 Time of GPU0 : 2.26233 Time of GPU1 : 2.259 ---------------------------------------------------------- Multi GPU Test 2 : Multiple context Single Thread ---------------------------------------------------------- Total time : 3 Time of GPU0 : 2.262 Time of GPU1 : 2.25044 ---------------------------------------------------------- Multi GPU Test 3 : Multiple context Multiple Thread ---------------------------------------------------------- Total time : 3 Time of GPU0 : 2.25078 Time of GPU1 : 2.25267

 

...

 

This results are very strange..., at least for me a rookie of OpenCL , and I don't know whether this specific GPU cannot execute two kernel smiultaneously or not.

 

 

Thanks in advanced!

 

 

Hi, may be I'm missing something, I don't quite get what is being wrong with your result?

Here is my output:

----------------------------------------------------------
CPU + GPU Test 1 : Single context Single Thread
----------------------------------------------------------
Total time : 29
Time of CPU : 26.7108
Time of GPU : 2.53355
----------------------------------------------------------
CPU + GPU Test 2 : Multiple context Single Thread
----------------------------------------------------------
Total time : 29
Time of CPU : 26.8218
Time of GPU : 2.25389
----------------------------------------------------------
CPU + GPU Test 3 : Multiple context Multiple Thread
----------------------------------------------------------
Total time : 29
Time of CPU : 28.5875
Time of GPU : 2.25489
----------------------------------------------------------
Multi GPU Test 1 : Single context Single Thread
----------------------------------------------------------
Total time : 23
Time of GPU0 : 2.25433
Time of GPU1 : 2.25067
Time of GPU2 : 2.26367
Time of GPU3 : 2.264
Time of GPU4 : 2.254
Time of GPU5 : 2.25478
Time of GPU6 : 2.258
Time of GPU7 : 2.25844
----------------------------------------------------------
Multi GPU Test 2 : Multiple context Single Thread
----------------------------------------------------------
Total time : 27
Time of GPU0 : 2.25689
Time of GPU1 : 2.25222
Time of GPU2 : 2.26367
Time of GPU3 : 2.25222
Time of GPU4 : 2.25444
Time of GPU5 : 2.25611
Time of GPU6 : 2.265
Time of GPU7 : 2.26222
----------------------------------------------------------
Multi GPU Test 3 : Multiple context Multiple Thread
----------------------------------------------------------
Total time : 21
Time of GPU0 : 2.25333
Time of GPU1 : 2.25211
Time of GPU2 : 2.25444
Time of GPU3 : 2.25344
Time of GPU4 : 2.25411
Time of GPU5 : 2.26578
Time of GPU6 : 2.28556
Time of GPU7 : 2.25478

0 Likes