cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

omion
Journeyman III

OpenCL not returning correct cache size for CPU

I just ran across a problem in the Stream SDK 2.1. It seems that CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE and CL_DEVICE_GLOBAL_MEM_CACHE_SIZE both return 0 for my CPU, even though I was sure that 2.0 returned the correct values (64 bytes and 6MiB, respectively)
I have attached the full output from CLInfo.
Can anyone else verify this?

Thanks,
Reed

Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.0 ATI-Stream-v2.1 (145) Platform Name: ATI Stream Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd Platform Name: ATI Stream Number of devices: 2 Device Type: CL_DEVICE_TYPE_CPU Device ID: 4098 Max compute units: 4 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 1024 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Max clock frequency: 3000Mhz Address bits: 32 Max memory allocation: 536870912 Image support: No Max size of kernel argument: 4096 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: No Round to +ve and infinity: No IEEE754-2008 fused multiply-add: No Cache type: Read/Write Cache line size: 0 Cache size: 0 Global memory size: 1073741824 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 32768 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0126946C Name: Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz Vendor: GenuineIntel Driver version: 1.1 Profile: FULL_PROFILE Version: OpenCL 1.0 ATI-Stream-v2.1 (145) Extensions: cl_khr_icd cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_printf Device Type: CL_DEVICE_TYPE_GPU Device ID: 4098 Max compute units: 18 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Max clock frequency: 725Mhz Address bits: 32 Max memory allocation: 268435456 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 32768 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: No Round to +ve and infinity: No IEEE754-2008 fused multiply-add: No Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 268435456 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0126946C Name: Cypress Vendor: Advanced Micro Devices, Inc. Driver version: CAL 1.4.636 Profile: FULL_PROFILE Version: OpenCL 1.0 ATI-Stream-v2.1 (145) Extensions: cl_khr_icd cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_d3d9_sharing Passed!

0 Likes
2 Replies
luocf
Journeyman III

This happens too to my Intel's Q6600 CPU, both cache size and cache-line size are 0. But it seems only on Intel's CPU. On my another AMD's Atholon64x2 CPU, it is correct. And I compared the performance between the two, the speed on Intel's CPU is only 1/16 of the theoretical performance. This is exactly the problem of the cache [it should be 64 Byte, so there is only 1/8 performance if using wrong cache-line size.]

0 Likes

Developers are looking into the issue and will be fixed in an upcoming release. Thanks for reporting.

0 Likes