cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

lyqqing
Journeyman III

max computer unit for my RV710 gpu is only 2

Hi,

I have a rv710 (hd4550) gpu.  While running opencl samples, I find "--device gpu" is  not faster than "--device cpu". 

 

CLInfo.exe  shows that "max computer unit " of my gpu  is only 2, but it should be more than that. I believe my driver is the latest. 

 

Could anybody give some suggestion? thanks in advance.

 

________________CLInfo.exe____________________

Number of platforms:                 1
  Platform Profile:                 FULL_PROFILE
  Platform Version:                 OpenCL 1.0 ATI-Stream-v2.1 (145)
  Platform Name:                     ATI Stream
  Platform Vendor:                 Advanced Micro Devices, Inc.
  Platform Extensions:             cl_khr_icd


  Platform Name:                     ATI Stream
Number of devices:                 2
  Device Type:                     CL_DEVICE_TYPE_CPU
  Device ID:                     4098
  Max compute units:                 2
  Max work items dimensions:             3
    Max work items[0]:                 1024
    Max work items[1]:                 1024
    Max work items[2]:                 1024
  Max work group size:                 1024
  Preferred vector width char:             16
  Preferred vector width short:             8
  Preferred vector width int:             4
  Preferred vector width long:             2
  Preferred vector width float:             4
  Preferred vector width double:         0
  Max clock frequency:                 2500Mhz
  Address bits:                     32
  Max memory allocation:             536870912
  Image support:                 No
  Max size of kernel argument:             4096
  Alignment (bits) of base address:         1024
  Minimum alignment (bytes) for any datatype:     128
  Single precision floating point capability
    Denorms:                     Yes
    Quiet NaNs:                     Yes
    Round to nearest even:             Yes
    Round to zero:                 No
    Round to +ve and infinity:             No
    IEEE754-2008 fused multiply-add:         No
  Cache type:                     Read/Write
  Cache line size:                 64
  Cache size:                     65536
  Global memory size:                 1073741824
  Constant buffer size:                 65536
  Max number of constant args:             8
  Local memory type:                 Global
  Local memory size:                 32768
  Profiling timer resolution:             1
  Device endianess:                 Little
  Available:                     Yes
  Compiler available:                 Yes
  Execution capabilities:                 
    Execute OpenCL kernels:             Yes
    Execute native function:             No
  Queue properties:                 
    Out-of-Order:                 No
    Profiling :                     Yes
  Platform ID:                     00E8946C
  Name:                         AMD Athlon(tm) 64 X2 Dual Core Processor 4800+
  Vendor:                     AuthenticAMD
  Driver version:                 1.1
  Profile:                     FULL_PROFILE
  Version:                     OpenCL 1.0 ATI-Stream-v2.1 (145)
  Extensions:                     cl_khr_icd cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_printf
  Device Type:                     CL_DEVICE_TYPE_GPU
  Device ID:                     4098
  Max compute units:                 2
  Max work items dimensions:             3
    Max work items[0]:                 128
    Max work items[1]:                 128
    Max work items[2]:                 128
  Max work group size:                 128
  Preferred vector width char:             16
  Preferred vector width short:             8
  Preferred vector width int:             4
  Preferred vector width long:             2
  Preferred vector width float:             4
  Preferred vector width double:         0
  Max clock frequency:                 600Mhz
  Address bits:                     32
  Max memory allocation:             134217728
  Image support:                 No
  Max size of kernel argument:             1024
  Alignment (bits) of base address:         32768
  Minimum alignment (bytes) for any datatype:     128
  Single precision floating point capability
    Denorms:                     No
    Quiet NaNs:                     Yes
    Round to nearest even:             Yes
    Round to zero:                 No
    Round to +ve and infinity:             No
    IEEE754-2008 fused multiply-add:         No
  Cache type:                     None
  Cache line size:                 0
  Cache size:                     0
  Global memory size:                 134217728
  Constant buffer size:                 65536
  Max number of constant args:             8
  Local memory type:                 Global
  Local memory size:                 16384
  Profiling timer resolution:             1
  Device endianess:                 Little
  Available:                     Yes
  Compiler available:                 Yes
  Execution capabilities:                 
    Execute OpenCL kernels:             Yes
    Execute native function:             No
  Queue properties:                 
    Out-of-Order:                 No
    Profiling :                     Yes
  Platform ID:                     00E8946C
  Name:                         ATI RV710
  Vendor:                     Advanced Micro Devices, Inc.
  Driver version:                 CAL 1.4.636
  Profile:                     FULL_PROFILE
  Version:                     OpenCL 1.0 ATI-Stream-v2.1 (145)
  Extensions:                     cl_khr_icd cl_khr_gl_sharing cl_amd_device_attribute_query


Passed!

0 Likes
4 Replies
omkaranathan
Adept I

 

I have a rv710 (hd4550) gpu.  While running opencl samples, I find "--device gpu" is  not faster than "--device cpu". 

 

Are you running the samples for default sizes? Try running them for bigger input size. You'll see the difference. 

 

CLInfo.exe  shows that "max computer unit " of my gpu  is only 2, but it should be more than that. I believe my driver is the latest. 

 

RV710 is has 2 compute units. CLInfo is showing correct information.

0 Likes

Thanks for your reply.

I think I have mistook computer unit with Stream Processing Units. (rv710 do have 80 SPs)

 

 

0 Likes

Originally posted by: lyqqing Thanks for your reply.

 

I think I have mistook computer unit with Stream Processing Units. (rv710 do have 80 SPs)

 

 

Each compute unit has 80 SP's so RV710 have 160 SP's totallly

0 Likes

That is not true, the number of compute units in RV710 is 2 and total 
scalar processor is 80 in booth compute units, each compute units has 8 stream cores and each cores has 5, so 2x8x5 is 80 SPs.

Regards

0 Likes