Hi all,
I'm testing SimpleMultiDevice sample on ATI HD6990 (Ubuntu 10.10) with the following results:
----------------------------------------------------------
CPU + GPU Test 1 : Single context Single Thread
----------------------------------------------------------
Total time : 51
Time of CPU : 50.7218
Time of GPU : 2.694
----------------------------------------------------------
CPU + GPU Test 2 : Multiple context Single Thread
----------------------------------------------------------
Total time : 51
Time of CPU : 53.2404
Time of GPU : 2.25078
----------------------------------------------------------
CPU + GPU Test 3 : Multiple context Multiple Thread
----------------------------------------------------------
Total time : 52
Time of CPU : 52.0495
Time of GPU : 2.25067
----------------------------------------------------------
Multi GPU Test 1 : Single context Single Thread
----------------------------------------------------------
Total time : 3
Time of GPU0 : 2.26233
Time of GPU1 : 2.259
----------------------------------------------------------
Multi GPU Test 2 : Multiple context Single Thread
----------------------------------------------------------
Total time : 3
Time of GPU0 : 2.262
Time of GPU1 : 2.25044
----------------------------------------------------------
Multi GPU Test 3 : Multiple context Multiple Thread
----------------------------------------------------------
Total time : 3
Time of GPU0 : 2.25078
Time of GPU1 : 2.25267
and the clinfo result is:
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 3
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4098
Max compute units: 24
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 4
Native vector width double: 0
Max clock frequency: 830Mhz
Address bits: 32
Max memory allocation: 268435456
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 32768
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 1073741824
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x7f0c165d3800
Name: Cayman
Vendor: Advanced Micro Devices, Inc.
Driver version: CAL 1.4.1385
Profile: FULL_PROFILE
Version: OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
Extensions: cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops cl_amd_popcnt
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4098
Max compute units: 24
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 4
Native vector width double: 0
Max clock frequency: 0Mhz
Address bits: 32
Max memory allocation: 268435456
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 32768
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 1073741824
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x7f0c165d3800
Name: Cayman
Vendor: Advanced Micro Devices, Inc.
Driver version: CAL 1.4.1385
Profile: FULL_PROFILE
Version: OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
Extensions: cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops cl_amd_popcnt
Device Type: CL_DEVICE_TYPE_CPU
Device ID: 4098
Max compute units: 4
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 4
Native vector width double: 0
Max clock frequency: 1998Mhz
Address bits: 64
Max memory allocation: 2147483648
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: No
Cache type: Read/Write
Cache line size: 64
Cache size: 32768
Global memory size: 4156026880
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Kernel Preferred work group size multiple: 1
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x7f0c165d3800
Name: Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz
Vendor: GenuineIntel
Driver version: 2.0
Profile: FULL_PROFILE
Version: OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_media_ops cl_amd_popcnt cl_amd_printf
This results are very strange..., at least for me a rookie of OpenCL , and I don't know whether this specific GPU cannot execute two kernel smiultaneously or not.
Thanks in advanced!
Thanks for reporting it.
Originally posted by: g1zm0 Hi all,
I'm testing SimpleMultiDevice sample on ATI HD6990 (Ubuntu 10.10) with the following results:
---------------------------------------------------------- CPU + GPU Test 1 : Single context Single Thread ---------------------------------------------------------- Total time : 51 Time of CPU : 50.7218 Time of GPU : 2.694 ---------------------------------------------------------- CPU + GPU Test 2 : Multiple context Single Thread ---------------------------------------------------------- Total time : 51 Time of CPU : 53.2404 Time of GPU : 2.25078 ---------------------------------------------------------- CPU + GPU Test 3 : Multiple context Multiple Thread ---------------------------------------------------------- Total time : 52 Time of CPU : 52.0495 Time of GPU : 2.25067 ---------------------------------------------------------- Multi GPU Test 1 : Single context Single Thread ---------------------------------------------------------- Total time : 3 Time of GPU0 : 2.26233 Time of GPU1 : 2.259 ---------------------------------------------------------- Multi GPU Test 2 : Multiple context Single Thread ---------------------------------------------------------- Total time : 3 Time of GPU0 : 2.262 Time of GPU1 : 2.25044 ---------------------------------------------------------- Multi GPU Test 3 : Multiple context Multiple Thread ---------------------------------------------------------- Total time : 3 Time of GPU0 : 2.25078 Time of GPU1 : 2.25267
...
This results are very strange..., at least for me a rookie of OpenCL , and I don't know whether this specific GPU cannot execute two kernel smiultaneously or not.
Thanks in advanced!
Hi, may be I'm missing something, I don't quite get what is being wrong with your result?
Here is my output:
----------------------------------------------------------
CPU + GPU Test 1 : Single context Single Thread
----------------------------------------------------------
Total time : 29
Time of CPU : 26.7108
Time of GPU : 2.53355
----------------------------------------------------------
CPU + GPU Test 2 : Multiple context Single Thread
----------------------------------------------------------
Total time : 29
Time of CPU : 26.8218
Time of GPU : 2.25389
----------------------------------------------------------
CPU + GPU Test 3 : Multiple context Multiple Thread
----------------------------------------------------------
Total time : 29
Time of CPU : 28.5875
Time of GPU : 2.25489
----------------------------------------------------------
Multi GPU Test 1 : Single context Single Thread
----------------------------------------------------------
Total time : 23
Time of GPU0 : 2.25433
Time of GPU1 : 2.25067
Time of GPU2 : 2.26367
Time of GPU3 : 2.264
Time of GPU4 : 2.254
Time of GPU5 : 2.25478
Time of GPU6 : 2.258
Time of GPU7 : 2.25844
----------------------------------------------------------
Multi GPU Test 2 : Multiple context Single Thread
----------------------------------------------------------
Total time : 27
Time of GPU0 : 2.25689
Time of GPU1 : 2.25222
Time of GPU2 : 2.26367
Time of GPU3 : 2.25222
Time of GPU4 : 2.25444
Time of GPU5 : 2.25611
Time of GPU6 : 2.265
Time of GPU7 : 2.26222
----------------------------------------------------------
Multi GPU Test 3 : Multiple context Multiple Thread
----------------------------------------------------------
Total time : 21
Time of GPU0 : 2.25333
Time of GPU1 : 2.25211
Time of GPU2 : 2.25444
Time of GPU3 : 2.25344
Time of GPU4 : 2.25411
Time of GPU5 : 2.26578
Time of GPU6 : 2.28556
Time of GPU7 : 2.25478