7 Replies Latest reply on Sep 8, 2011 4:34 AM by settle

    AMD Platform with Intel CPU (and AMD GPU), vs. Intel Platform

    settle

      This output is from clinfo in ati-stream-sdk-v2.1-lnx64; however for some reason clinfo fails during the clGetDeviceInfo with the Intel platform in versions after and including ati-stream-sdk-v2.2-lnx64.

       

      If you look closely at the CL_DEVICE_TYPE_CPU sections you'll notice several strange differences between AMD platform and Intel platform for the exact same device, just to name a few:

      • Preferred vector width double
      • Cache size
      • Single precision floating point capability

      Anyways, I pefer using AMD's platform because I can put all my devices in a single context, but I'd like to use the full capabilities of my CPU as well.

       

      Number of platforms: 2 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.1 LINUX Platform Name: Intel(R) OpenCL Platform Vendor: Intel(R) Corporation Platform Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_gl_sharing cl_khr_byte_addressable_store cl_intel_printf cl_ext_device_fission cl_khr_icd Platform Name: AMD Accelerated Parallel Processing Number of devices: 3 Device Type: CL_DEVICE_TYPE_GPU Device ID: 4098 Max compute units: 20 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 2 Max clock frequency: 725Mhz Address bits: 32 Max memory allocation: 134217728 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 32768 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 536870912 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0x7f58c5bcf060 Name: Cypress Vendor: Advanced Micro Devices, Inc. Driver version: CAL 1.4.1523 Profile: FULL_PROFILE Version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213) Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt Device Type: CL_DEVICE_TYPE_GPU Device ID: 4098 Max compute units: 20 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 2 Max clock frequency: 725Mhz Address bits: 32 Max memory allocation: 134217728 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 32768 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 536870912 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0x7f58c5bcf060 Name: Cypress Vendor: Advanced Micro Devices, Inc. Driver version: CAL 1.4.1523 Profile: FULL_PROFILE Version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213) Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt Device Type: CL_DEVICE_TYPE_CPU Device ID: 4098 Max compute units: 4 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 1024 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Max clock frequency: 2668Mhz Address bits: 64 Max memory allocation: 2101755904 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 4096 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: No Cache type: Read/Write Cache line size: 64 Cache size: 32768 Global memory size: 2101755904 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 32768 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: Yes Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0x7f58c5bcf060 Name: Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz Vendor: GenuineIntel Driver version: 2.0 Profile: FULL_PROFILE Version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213) Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_media_ops cl_amd_popcnt cl_amd_printf Passed! Platform Name: Intel(R) OpenCL Number of devices: 1 Device Type: CL_DEVICE_TYPE_CPU Device ID: 32902 Max compute units: 4 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 1024 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 2 Max clock frequency: 2670Mhz Address bits: 64 Max memory allocation: 525438976 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 128 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 128 Max size of kernel argument: 1024 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: No Round to +ve and infinity: No IEEE754-2008 fused multiply-add: No Cache type: Read/Write Cache line size: 64 Cache size: 262144 Global memory size: 2101755904 Constant buffer size: 131072 Max number of constant args: 128 Local memory type: Global Local memory size: 32768 Profiling timer resolution: 374812 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: Yes Queue properties: Out-of-Order: Yes Profiling : Yes Platform ID: 0x974300 Name: Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz Vendor: Intel(R) Corporation Driver version: 1.1 Profile: FULL_PROFILE Version: OpenCL 1.1 (Build 13785.5219) Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_gl_sharing cl_khr_byte_addressable_store cl_intel_printf cl_ext_device_fission Passed!

        • AMD Platform with Intel CPU (and AMD GPU), vs. Intel Platform
          settle

          bump

           

          My concern is that the AMD APP SDK works on x86 devices but in some cases does not seem to accurately query the correct hardware info.  Some of these differences are purely runtime support (e.g., Out-of-order command queues) and I understand these may vary widely from platform to platform, but others are inherent to the underlying hardware (e.g., Preferred vector width double) and should be common between any platform supporting the same device.

           

          Any comments from AMD on this?  If any of these differences is a simple fix I'd appreciate it when I use the AMD platform to build my contexts of CPUs and GPUs.

            • AMD Platform with Intel CPU (and AMD GPU), vs. Intel Platform
              himanshu.gautam

              Can you tell the AMD APP SDK version you are using?

              I get preferred vector double as zero for AMD APP SDK 2.5. Which is in sync with OpenCL 1.1 spec.

                • AMD Platform with Intel CPU (and AMD GPU), vs. Intel Platform
                  settle

                  The above was using clinfo in SDK 2.1, but clinfo in SDK 2.2 fails with an error just like the from clinfo in SDK 2.5:

                  Number of platforms: 2 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.1 LINUX Platform Name: Intel(R) OpenCL Platform Vendor: Intel(R) Corporation Platform Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_gl_sharing cl_khr_byte_addressable_store cl_intel_printf cl_ext_device_fission cl_khr_icd Platform Name: AMD Accelerated Parallel Processing Number of devices: 3 Device Type: CL_DEVICE_TYPE_GPU Device ID: 4098 Device Topology: PCI[ B#3, D#0, F#0 ] Max compute units: 20 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 2 Native vector width char: 16 Native vector width short: 8 Native vector width int: 4 Native vector width long: 2 Native vector width float: 4 Native vector width double: 2 Max clock frequency: 725Mhz Address bits: 32 Max memory allocation: 134217728 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 32768 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 536870912 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Kernel Preferred work group size multiple: 64 Error correction support: 0 Unified memory for Host and Device: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0x7f42fe65b060 Name: Cypress Vendor: Advanced Micro Devices, Inc. Device OpenCL C version: OpenCL C 1.1 Driver version: CAL 1.4.1523 Profile: FULL_PROFILE Version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213) Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt Device Type: CL_DEVICE_TYPE_GPU Device ID: 4098 Device Topology: PCI[ B#4, D#0, F#0 ] Max compute units: 20 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 2 Native vector width char: 16 Native vector width short: 8 Native vector width int: 4 Native vector width long: 2 Native vector width float: 4 Native vector width double: 2 Max clock frequency: 725Mhz Address bits: 32 Max memory allocation: 134217728 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 32768 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 536870912 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Kernel Preferred work group size multiple: 64 Error correction support: 0 Unified memory for Host and Device: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0x7f42fe65b060 Name: Cypress Vendor: Advanced Micro Devices, Inc. Device OpenCL C version: OpenCL C 1.1 Driver version: CAL 1.4.1523 Profile: FULL_PROFILE Version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213) Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt Device Type: CL_DEVICE_TYPE_CPU Device ID: 4098 Max compute units: 4 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 1024 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Native vector width char: 16 Native vector width short: 8 Native vector width int: 4 Native vector width long: 2 Native vector width float: 4 Native vector width double: 0 Max clock frequency: 2668Mhz Address bits: 64 Max memory allocation: 2101755904 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 4096 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: No Cache type: Read/Write Cache line size: 64 Cache size: 32768 Global memory size: 2101755904 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 32768 Kernel Preferred work group size multiple: 1 Error correction support: 0 Unified memory for Host and Device: 1 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: Yes Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0x7f42fe65b060 Name: Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz Vendor: GenuineIntel Device OpenCL C version: OpenCL C 1.1 Driver version: 2.0 Profile: FULL_PROFILE Version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213) Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_media_ops cl_amd_popcnt cl_amd_printf Platform Name: Intel(R) OpenCL Number of devices: 1 Device Type: CL_DEVICE_TYPE_CPU Device ID: 32902 ERROR: clGetDeviceInfo(-30)

                  • AMD Platform with Intel CPU (and AMD GPU), vs. Intel Platform
                    settle

                     

                    Originally posted by: himanshu.gautamI get preferred vector double as zero for AMD APP SDK 2.5. Which is in sync with OpenCL 1.1 spec.

                     

                     

                    If preferred vector double is 0 that means either double is not supported or scalar double (i.e., double1) is the prefered type.  But the issue is in the Intel platform it returns 2 as the preferred vector double, not 0 as in the AMD platform.  Remember this is for the same exact CPU device, Intel Core i5 750.  The GPU device info all looks accurate, as it definately should being an AMD device.

                      • AMD Platform with Intel CPU (and AMD GPU), vs. Intel Platform
                        himanshu.gautam

                        settle,

                        I don't get something like that on my intel i7. Anyhow intel's CPU is not a tested/supported platform.

                        Can you tell about device topology parameter, I don't have it in my clinfo version.

                        Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.1 AMD-APP-SDK-v2.5-internal (720) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_ext_object_metadata cl_amd_event_callback cl_amd_offline_devices cl_kh Platform Name: AMD Accelerated Parallel Processing Number of devices: 2 Device Type: CL_DEVICE_TYPE_GPU Device ID: 4098 Max compute units: 10 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Native vector width char: 16 Native vector width short: 8 Native vector width int: 4 Native vector width long: 2 Native vector width float: 4 Native vector width double: 0 Max clock frequency: 700Mhz Address bits: 32 Max memory allocation: 536870912 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 32768 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 1073741824 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Kernel Preferred work group size multiple: 64 Error correction support: 0 Unified memory for Host and Device: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 000007FEE6C47348 Name: Juniper Vendor: Advanced Micro Devices, Inc. Device OpenCL C version: OpenCL C 1.1 Driver version: CAL 1.4.1457 (VM) Profile: FULL_PROFILE Version: OpenCL 1.1 AMD-APP-SDK-v2.5-internal (720) Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_i ded_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_ ia_ops cl_amd_popcnt cl_khr_d3d10_sharing Device Type: CL_DEVICE_TYPE_CPU Device ID: 4098 Max compute units: 8 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 1024 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Native vector width char: 16 Native vector width short: 8 Native vector width int: 4 Native vector width long: 2 Native vector width float: 4 Native vector width double: 0 Max clock frequency: 1729Mhz Address bits: 64 Max memory allocation: 2147483648 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 4096 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: No Cache type: Read/Write Cache line size: 64 Cache size: 32768 Global memory size: 6363336704 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 32768 Kernel Preferred work group size multiple: 1 Error correction support: 0 Unified memory for Host and Device: 1 Profiling timer resolution: 592 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: Yes Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 000007FEE6C47348 Name: Intel(R) Core(TM) i7 CPU Q 740 @ 1.73GHz Vendor: GenuineIntel Device OpenCL C version: OpenCL C 1.1 Driver version: 2.0 Profile: FULL_PROFILE Version: OpenCL 1.1 AMD-APP-SDK-v2.5-internal (720) Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extende cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_s _attribute_query cl_amd_vec3 cl_amd_media_ops cl_amd_popcnt cl_amd_printf cl_khr_d3d10_sharing

                          • AMD Platform with Intel CPU (and AMD GPU), vs. Intel Platform
                            settle

                             

                            Originally posted by: himanshu.gautamI don't get something like that on my intel i7. Anyhow intel's CPU is not a tested/supported platform.

                            Can you tell about device topology parameter, I don't have it in my clinfo version.



                            The device topology parameter is unique to AMD APP SDK and returns the PCI bus, device, and function for the queried GPU device.  In Linux you these numbers (in the same order) can be found using the command "/sbin/lspci | grep VGA."  I guess this helps to uniquely identify devices between runtimes and maybe help determine some theoretical bandwidth limits if you know more about the motherboard.

                             

                            BTW, this post says that my configuration is supported and tested.

                             

                            Originally posted by: MicahVillmowThere are four different generic setups that you can have, AMD CPU + AMD GPU, AMD CPU + non AMD GPU, non AMD CPU + AMD GPU, non AMD CPU + non AMD GPU. The first three configurations are supported and tested, the last one, which your configuration is, is not tested.


                             

                             

                            Originally posted by: MicahVillmowWhat the Intel SDK reports and what the AMD APP SDK reports can diverge as the implementations are different, even if the device is the same.


                            For some of these divergent values, such as "Preferred vector width double," I agree that they are completely implementation dependent.  However, things like "Native vector width double" and "Cache line size" should not diverge; these are physical constants characteristic of the hardware.

                            I guess it really boils down to not supporting your competitor’s hardware (what's the financial incentive, right?), but then why doesn't the AMD OpenCL platform just return an error when trying to use non-AMD CPUs?  One could argue that some trivial support for non-AMD CPUs will lead more people to buy AMD GPUs, maybe also buying an AMD CPU instead of a non-AMD CPU, and lead more developers to use the AMD APP SDK (these are some good financial incentives).  I say this because the most annoying part about OpenCL is having to use multiple contexts in multi-vendor configurations since a context must be constructed from a single platform--unless using AMD APP SDK or IBM alphaWorks OCR SDK.  I'm not suggesting AMD should optimize for non-AMD hardware, just return accurate values when reasonable.

                            Anyways, this isn't really a big issue because the current support is amazingly good.  I'm just suggesting a couple low-priority fixes for things that may not have been known.

                             

                    • AMD Platform with Intel CPU (and AMD GPU), vs. Intel Platform
                      MicahVillmow
                      settle,
                      What the Intel SDK reports and what the AMD APP SDK reports can diverge as the implementations are different, even if the device is the same.