Hi,
I have a rv710 (hd4550) gpu. While running opencl samples, I find "--device gpu" is not faster than "--device cpu".
CLInfo.exe shows that "max computer unit " of my gpu is only 2, but it should be more than that. I believe my driver is the latest.
Could anybody give some suggestion? thanks in advance.
________________CLInfo.exe____________________
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.0 ATI-Stream-v2.1 (145)
Platform Name: ATI Stream
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd
Platform Name: ATI Stream
Number of devices: 2
Device Type: CL_DEVICE_TYPE_CPU
Device ID: 4098
Max compute units: 2
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Max clock frequency: 2500Mhz
Address bits: 32
Max memory allocation: 536870912
Image support: No
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: No
Round to +ve and infinity: No
IEEE754-2008 fused multiply-add: No
Cache type: Read/Write
Cache line size: 64
Cache size: 65536
Global memory size: 1073741824
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 00E8946C
Name: AMD Athlon(tm) 64 X2 Dual Core Processor 4800+
Vendor: AuthenticAMD
Driver version: 1.1
Profile: FULL_PROFILE
Version: OpenCL 1.0 ATI-Stream-v2.1 (145)
Extensions: cl_khr_icd cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_printf
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4098
Max compute units: 2
Max work items dimensions: 3
Max work items[0]: 128
Max work items[1]: 128
Max work items[2]: 128
Max work group size: 128
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Max clock frequency: 600Mhz
Address bits: 32
Max memory allocation: 134217728
Image support: No
Max size of kernel argument: 1024
Alignment (bits) of base address: 32768
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: No
Round to +ve and infinity: No
IEEE754-2008 fused multiply-add: No
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 134217728
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 16384
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 00E8946C
Name: ATI RV710
Vendor: Advanced Micro Devices, Inc.
Driver version: CAL 1.4.636
Profile: FULL_PROFILE
Version: OpenCL 1.0 ATI-Stream-v2.1 (145)
Extensions: cl_khr_icd cl_khr_gl_sharing cl_amd_device_attribute_query
Passed!
I have a rv710 (hd4550) gpu. While running opencl samples, I find "--device gpu" is not faster than "--device cpu".
Are you running the samples for default sizes? Try running them for bigger input size. You'll see the difference.
CLInfo.exe shows that "max computer unit " of my gpu is only 2, but it should be more than that. I believe my driver is the latest.
RV710 is has 2 compute units. CLInfo is showing correct information.
Thanks for your reply.
I think I have mistook computer unit with Stream Processing Units. (rv710 do have 80 SPs)
Originally posted by: lyqqing Thanks for your reply.
I think I have mistook computer unit with Stream Processing Units. (rv710 do have 80 SPs)
Each compute unit has 80 SP's so RV710 have 160 SP's totallly
That is not true, the number of compute units in RV710 is 2 and total
scalar processor is 80 in booth compute units, each compute units has 8 stream cores and each cores has 5, so 2x8x5 is 80 SPs.
Regards