diepchess

clGetDeviceInfo

Discussion created by diepchess on Apr 17, 2011
Latest reply on Apr 25, 2011 by diepchess
Bug reports and/or questions/requests

Good Morning!

 

I printed some information with opencl from the system and see it report data i have questions about.

 

Now it's great if i get back some answers, and if so if you split it in multiple postings or subjects or whatever. So apologies i post it all at once if you'd prefer more than 1 posting, let me know...

See attached text to this. Just cut'n paste it please to fixed font width to see it better formatted.

 

Question 1: it reports the GPU has 1GB ram and the CPU has 10GB ram. That 10GB of the quad socket box is correct. Yet I bought a 6970 XFX with 2GB ddr5. The label on the box i bought says: HD 6970 880M 2GB ddr5 dual dp hdmi dual dvi pci-e. 

Let's start with the most likely possibility: opencl reports the gpu device RAM wrong.

 

Question 2: i bought a 2GB RAM device, and 2GB is nowadays really little especially with 1536 streamcores, in order to use it. What sketches my amazement that from the amount of RAM it finds, it just allows an object to use 25% of that. 

  a) can that get raised to the amount of RAM it has?

  b) why this strange 25% limit? Suppose you buy a formula 1 car with 950

      horse power and you can just use 240 horsepower. Good deal? Or is your

      next car then a nvidia? It makes no sense to limit this.

 

Question 3: It reports correctly the GPU has 24 compute cores. Using which other information can i now calculate that i have 1536 PE's available to me at the gpu?

How do i accomplish that, with which function call or setting or mathematical formula? I scanned the entire document opencl-1.1-rev33.pdf but couldn't figure it out.

Please enlighten me.

Note that it does report the number of cores correct of the 16 core box 8356.

 

Question 4: It returns for the GPU's CL_DEVICE_MAX_CLOCK_FREQUENCY = 0

whereas for the CPU it reports correctly it is 2310 (Mhz). How do i figure out the frequency setting of the GPU?

 

Question 5: it reports for the CL_DEVICE_GLOBAL_MEM_CACHE_SIZE = 0,

yet i thought the 6000 series has a Global Data Store. Can you enlighten me there? 

 

There is lots of things wrong in the reporting on the CPU settings.

Question 6: It correctly doesn't show the GPU to be out of order. Yet it doesn't with the CPU. Hope i interpreted it correctly as this command queue gets listed under 'execution model' chapter 5. The cpu doesn't have a bit set indicating 

at the field set to:  CL_DEVICE_QUEUE_PROPERTIES = CL_QUEUE_PROFILING_ENABLE

that CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE is set.

 

Question 7: It doesn't report the opteron box to have ECC capabilities at the setting CL_DEVICE_ERROR_CORRECTION_SUPPORT

 

Question 8: I really don't understand the numbers it prints at the cpu at all sorts of things like global_mem_cache_size. It reports 64KB. Really the SRAM of the opteron cpu to the RAM is a lot more. Something like 4MB all 4 cpu's together or so?

*please note SRAM at cpu's == L3 cache

 

Question 9:

It's reporting local mem size to be 32KB yet the cpu has 64KB L1 datacache for each core, so it can easily report 64 there as well. 

 

Question 10: is again on the GPU. How do i figure out it's a XFX? The command in linux 'lspci -v' seems to know somehow in linux what sort of videocard it is, but in opencl i don't see that text anywhere. Just that it is a cayman i get back.

Please enlighten me. Note that lspci -v shows the videocard to have 256MB ram, which is wrong as well. Would this impact videoperformance during some tests at some websites? That would be very bad news for AMD of course if so,

as the system guessing it can use 256MB whereas the card has 2GB is quite a bad idea. Anyone?

Thanks for having me till so far,

I'd argue that's enough for now. 

Vincent

diep@xs4all.nl

skype: diepchess

Number of Platforms found : 1 PROFILE = FULL_PROFILE VERSION = OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10) NAME = AMD Accelerated Parallel Processing VENDOR = Advanced Micro Devices, Inc. EXTENSIONS = cl_khr_icd cl_amd_event_callback cl_amd_offline_devices Number of devices found (and added) 2 at platform 0 Querying device = 0 DEVICETYPE = GPU CL_DEVICE_NAME = Cayman CL_DEVICE_VENDOR = Advanced Micro Devices, Inc. CL_DRIVER_VERSION = CAL 1.4.900 CL_DEVICE_PROFILE = FULL_PROFILE CL_DEVICE_VERSION = OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10) CL_DEVICE_OPENCL_C_VERSION = OpenCL C 1.1 CL_DEVICE_EXTENSIONS = cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops cl_amd_popcnt CL_DEVICE_GLOBAL_MEM_CACHE_TYPE = NONE CL_DEVICE_LOCAL_MEM_TYPE = LOCAL MEMORY (SRAM OR DEDICATED) CL_DEVICE_EXECUTION_CAPABILITIES = CL_EXEC_KERNEL CL_DEVICE_EXECUTION_CAPABILITIES = CL_QUEUE_PROFILING_ENABLE CL_DEVICE_MAX_MEM_ALLOC_SIZE = 268435456 CL_DEVICE_GLOBAL_MEM_CACHE_SIZE = 0 CL_DEVICE_GLOBAL_MEM_SIZE = 1073741824 CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE = 65536 CL_DEVICE_LOCAL_MEM_SIZE = 32768 CL_DEVICE_VENDOR_ID = 4098 CL_DEVICE_MAX_COMPUTE_UNITS = 24 CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS = 3 CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR = 16 CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT = 8 CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT = 4 CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG = 2 CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT = 4 CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE = 0 CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF = 0 CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR = 16 CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT = 8 CL_DEVICE_NATIVE_VECTOR_WIDTH_INT = 4 CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG = 2 CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT = 4 CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE = 0 CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF = 0 CL_DEVICE_MAX_CLOCK_FREQUENCY = 0 CL_DEVICE_ADDRESS_BITS = 32 CL_DEVICE_MAX_SAMPLERS = 16 CL_DEVICE_MEM_BASE_ADDR_ALIGN = 32768 CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE = 128 CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE = 0 CL_DEVICE_MAX_CONSTANT_ARGS = 8 CL_DEVICE_MAX_WORK_ITEM_SIZES = (256,256,256) CL_DEVICE_MAX_WORK_GROUP_SIZE = 256 CL_DEVICE_MAX_PARAMETER_SIZE = 1024 CL_DEVICE_PROFILING_TIMER_RESOLUTION = 1 CL_DEVICE_ERROR_CORRECTION_SUPPORT = FALSE CL_DEVICE_HOST_UNIFIED_MEMORY = FALSE CL_DEVICE_ENDIAN_LITTLE = TRUE CL_DEVICE_AVAILABLE = TRUE CL_DEVICE_COMPILER_AVAILABLE = TRUE Querying device = 1 DEVICETYPE = CPU CL_DEVICE_NAME = Quad-Core AMD Opteron(tm) Processor 8356 CL_DEVICE_VENDOR = AuthenticAMD CL_DRIVER_VERSION = 2.0 CL_DEVICE_PROFILE = FULL_PROFILE CL_DEVICE_VERSION = OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10) CL_DEVICE_OPENCL_C_VERSION = OpenCL C 1.1 CL_DEVICE_EXTENSIONS = cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_media_ops cl_amd_popcnt cl_amd_printf CL_DEVICE_GLOBAL_MEM_CACHE_TYPE = READ AND WRITE CL_DEVICE_LOCAL_MEM_TYPE = GLOBAL MEMORY CL_DEVICE_EXECUTION_CAPABILITIES = CL_EXEC_KERNEL | CL_EXEC_NATIVE_KERNEL CL_DEVICE_EXECUTION_CAPABILITIES = CL_QUEUE_PROFILING_ENABLE CL_DEVICE_MAX_MEM_ALLOC_SIZE = 2626032640 CL_DEVICE_GLOBAL_MEM_CACHE_SIZE = 65536 CL_DEVICE_GLOBAL_MEM_SIZE = 10504130560 CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE = 65536 CL_DEVICE_LOCAL_MEM_SIZE = 32768 CL_DEVICE_VENDOR_ID = 4098 CL_DEVICE_MAX_COMPUTE_UNITS = 16 CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS = 3 CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR = 16 CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT = 8 CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT = 4 CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG = 2 CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT = 4 CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE = 0 CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF = 0 CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR = 16 CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT = 8 CL_DEVICE_NATIVE_VECTOR_WIDTH_INT = 4 CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG = 2 CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT = 4 CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE = 0 CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF = 0 CL_DEVICE_MAX_CLOCK_FREQUENCY = 2310 CL_DEVICE_ADDRESS_BITS = 64 CL_DEVICE_MAX_SAMPLERS = 16 CL_DEVICE_MEM_BASE_ADDR_ALIGN = 1024 CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE = 128 CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE = 64 CL_DEVICE_MAX_CONSTANT_ARGS = 8 CL_DEVICE_MAX_WORK_ITEM_SIZES = (1024,1024,1024) CL_DEVICE_MAX_WORK_GROUP_SIZE = 1024 CL_DEVICE_MAX_PARAMETER_SIZE = 4096 CL_DEVICE_PROFILING_TIMER_RESOLUTION = 1 CL_DEVICE_ERROR_CORRECTION_SUPPORT = FALSE CL_DEVICE_HOST_UNIFIED_MEMORY = TRUE CL_DEVICE_ENDIAN_LITTLE = TRUE CL_DEVICE_AVAILABLE = TRUE CL_DEVICE_COMPILER_AVAILABLE = TRUE

Outcomes