15 Replies Latest reply on Feb 22, 2011 5:35 AM by Raistmer

    With NV and ATi GPU installed CLInfo refuses to list ATi GPU

    Raistmer
      -30 error reported...

      Log:

      C:\Program Files\ATI Stream\bin\x86>CLInfo.exe
      Number of platforms: 2
      Platform Profile: FULL_PROFILE
      Platform Version: OpenCL 1.0 CUDA 3.2.1
      Platform Name: NVIDIA CUDA
      Platform Vendor: NVIDIA Corporation
      Platform Extensions: cl_khr_byte_addressable_store c
      l_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_compiler_options cl_nv_devi
      ce_attribute_query cl_nv_pragma_unroll
      Platform Profile: FULL_PROFILE
      Platform Version: OpenCL 1.1 ATI-Stream-v2.3 (451
      )
      Platform Name: ATI Stream
      Platform Vendor: Advanced Micro Devices, Inc.
      Platform Extensions: cl_khr_icd cl_amd_event_callbac
      k cl_amd_offline_devices


      Platform Name: NVIDIA CUDA
      Number of devices: 1
      Device Type: CL_DEVICE_TYPE_GPU
      Device ID: 4318
      Max compute units: 4
      Max work items dimensions: 3
      Max work items[0]: 512
      Max work items[1]: 512
      Max work items[2]: 64
      Max work group size: 512
      Preferred vector width char: 1
      Preferred vector width short: 1
      Preferred vector width int: 1
      Preferred vector width long: 1
      Preferred vector width float: 1
      Preferred vector width double: 0
      ERROR: clgetDeviceInfo(-30)

        • With NV and ATi GPU installed CLInfo refuses to list ATi GPU
          nou

          compile CLInfo without OpenCL 1.1 support

           

          • With NV and ATi GPU installed CLInfo refuses to list ATi GPU
            Raistmer
            How to disable OpenCL 1.1 ? Sometimes code from CLInfo sample gives same error in my own app, and I didn't enable OpenCL 1.1 anywhere....
            • With NV and ATi GPU installed CLInfo refuses to list ATi GPU
              Raistmer
              yes, there is #ifdef CL_VERSION_1_1
              in original CLInfo sample, but I checked again, there is no this part in GPU detection code I borrowed from CLInfo in my app.
              Still it reports -30 error on some NV+ATi configs (not all hosts are affected).
              I'll try to rebuild CLinfo sample and report if it will fix issue on at least some affected hosts.
                • With NV and ATi GPU installed CLInfo refuses to list ATi GPU
                  ramandeep

                  Hello Raistmer,

                  Actually you are using the device which doesn't have supoort of OpenCL1.1 that's why it is giving this error. Please upgrade your device for proper results.

                  Thanks

                  Ramandeep

                    • With NV and ATi GPU installed CLInfo refuses to list ATi GPU
                      Raistmer
                      Originally posted by: ramandee p

                      Hello Raistmer,




                      Actually you are using the device which doesn't have supoort of OpenCL1.1 that's why it is giving this error. Please upgrade your device for proper results.




                      Thanks




                      Ramandeep



                      Hm, there were no referencies that CLinfo binary provided by AMD should be used ONLY on OpenCL 1.1 devices. Moreover, it's wrong, CLInfo works OK on OpenCL 1.0 devices too.

                      And to answer all other suggestions of type "just get new GPU":
                      please, take into account that I develop open source application for BOINC framework. That is, it should run successfully on ALL OPENCL-SUPPORTED HARDWARE. I can't "get new GPU" cause not only I use this app but ponentially ~500K peoples around world. Quite large auditory to take it into account, not?

                      @nou
                      Thanks for suggestion, it work indeed, I recived such report:
                      "
                      The CLInfo.exe from APP SDK 2.3 gives me -30 error, but the OpenCL AP/MB application and this modified CLInfo.exe gives me a proper result.
                      Radeon HD 4850 + GeForce GTX 260, both with current (as at February 2011) drivers installed
                      "
                      Now I have to reach those who experienced -30 error with app itself to provide more info on topic...
                        • With NV and ATi GPU installed CLInfo refuses to list ATi GPU
                          nou

                          you just can't use OpenCL 1.1 functionality. CLInfo is by default compiled with 1.1 functionality. and just after Preffered vector width it query CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR which is from OpenCL 1.1. proper way should be determine what info to query at a runtime not at compilation time. and as nVidia don't support CL 1.1 we are constrined to 1.0.

                            • With NV and ATi GPU installed CLInfo refuses to list ATi GPU
                              genaganna

                               

                              Originally posted by: nou you just can't use OpenCL 1.1 functionality. CLInfo is by default compiled with 1.1 functionality. and just after Preffered vector width it query CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR which is from OpenCL 1.1. proper way should be determine what info to query at a runtime not at compilation time. and as nVidia don't support CL 1.1 we are constrined to 1.0.

                               

                              Code will not be compiled if both runtime query and OpenCL 1.0 headers used.

                              we will make sure that in both cases sample behaviour is proper.  Thanks for idea.

                      • With NV and ATi GPU installed CLInfo refuses to list ATi GPU
                        Raistmer
                        Thanks again. But looks like not all so "easy" with NV/ATi combo.
                        Now I recived few reports from different testers.
                        I'll list some of them to illustrate that even with disabled OpenCL 1.1 CLinfo sometimes fails to list NV GPU.

                        1)
                        Modded CLinfo works fine:

                        Code:
                        Number of platforms: 2
                        Platform Profile: FULL_PROFILE
                        Platform Version: OpenCL 1.0 CUDA 3.2.1
                        Platform Name: NVIDIA CUDA
                        Platform Vendor: NVIDIA Corporation
                        Platform Extensions: cl_khr_byte_addressable_store c
                        l_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_
                        sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query
                        cl_nv_pragma_unroll
                        Platform Profile: FULL_PROFILE
                        Platform Version: OpenCL 1.1 ATI-Stream-v2.3 (451
                        )
                        Platform Name: ATI Stream
                        Platform Vendor: Advanced Micro Devices, Inc.
                        Platform Extensions: cl_khr_icd cl_amd_event_callbac
                        k cl_amd_offline_devices cl_khr_d3d10_sharing


                        Platform Name: NVIDIA CUDA
                        Number of devices: 1
                        Device Type: CL_DEVICE_TYPE_GPU
                        Device ID: 4318
                        Max compute units: 7
                        Max work items dimensions: 3
                        Max work items[0]: 1024
                        Max work items[1]: 1024
                        Max work items[2]: 64
                        Max work group size: 1024
                        Preferred vector width char: 1
                        Preferred vector width short: 1
                        Preferred vector width int: 1
                        Preferred vector width long: 1
                        Preferred vector width float: 1
                        Preferred vector width double: 1
                        Max clock frequency: 1600Mhz
                        Address bits: 14757395255531667488
                        Max memory allocation: 260423680
                        Image support: Yes
                        Max number of images read arguments: 128
                        Max number of images write arguments: 8
                        Max image 2D width: 4096
                        Max image 2D height: 32768
                        Max image 3D width: 2048
                        Max image 3D height: 2048
                        Max image 3D depth: 2048
                        Max samplers within kernel: 16
                        Max size of kernel argument: 4352
                        Alignment (bits) of base address: 4096
                        Minimum alignment (bytes) for any datatype: 128
                        Single precision floating point capability
                        Denorms: Yes
                        Quiet NaNs: Yes
                        Round to nearest even: Yes
                        Round to zero: Yes
                        Round to +ve and infinity: Yes
                        IEEE754-2008 fused multiply-add: Yes
                        Cache type: Read/Write
                        Cache line size: 128
                        Cache size: 114688
                        Global memory size: 1041694720
                        Constant buffer size: 65536
                        Max number of constant args: 9
                        Local memory type: Scratchpad
                        Local memory size: 49152
                        Error correction support: 0
                        Profiling timer resolution: 1000
                        Device endianess: Little
                        Available: Yes
                        Compiler available: Yes
                        Execution capabilities:
                        Execute OpenCL kernels: Yes
                        Execute native function: No
                        Queue properties:
                        Out-of-Order: Yes
                        Profiling : Yes
                        Platform ID: 003E0D78
                        Name: GeForce GTX 460
                        Vendor: NVIDIA Corporation
                        Driver version: 266.58
                        Profile: FULL_PROFILE
                        Version: OpenCL 1.0 CUDA
                        Extensions: cl_khr_byte_addressable_store c
                        l_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_
                        sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query
                        cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extend
                        ed_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics c
                        l_khr_fp64


                        Platform Name: ATI Stream
                        Number of devices: 2
                        Device Type: CL_DEVICE_TYPE_GPU
                        Device ID: 4098
                        Max compute units: 10
                        Max work items dimensions: 3
                        Max work items[0]: 256
                        Max work items[1]: 256
                        Max work items[2]: 256
                        Max work group size: 256
                        Preferred vector width char: 16
                        Preferred vector width short: 8
                        Preferred vector width int: 4
                        Preferred vector width long: 2
                        Preferred vector width float: 4
                        Preferred vector width double: 0
                        Max clock frequency: 850Mhz
                        Address bits: 32
                        Max memory allocation: 134217728
                        Image support: Yes
                        Max number of images read arguments: 128
                        Max number of images write arguments: 8
                        Max image 2D width: 8192
                        Max image 2D height: 8192
                        Max image 3D width: 2048
                        Max image 3D height: 2048
                        Max image 3D depth: 2048
                        Max samplers within kernel: 16
                        Max size of kernel argument: 1024
                        Alignment (bits) of base address: 32768
                        Minimum alignment (bytes) for any datatype: 128
                        Single precision floating point capability
                        Denorms: No
                        Quiet NaNs: Yes
                        Round to nearest even: Yes
                        Round to zero: Yes
                        Round to +ve and infinity: Yes
                        IEEE754-2008 fused multiply-add: Yes
                        Cache type: None
                        Cache line size: 0
                        Cache size: 0
                        Global memory size: 536870912
                        Constant buffer size: 65536
                        Max number of constant args: 8
                        Local memory type: Scratchpad
                        Local memory size: 32768
                        Error correction support: 0
                        Profiling timer resolution: 1
                        Device endianess: Little
                        Available: Yes
                        Compiler available: Yes
                        Execution capabilities:
                        Execute OpenCL kernels: Yes
                        Execute native function: No
                        Queue properties:
                        Out-of-Order: No
                        Profiling : Yes
                        Platform ID: 0344A40C
                        Name: Juniper
                        Vendor: Advanced Micro Devices, Inc.
                        Driver version: CAL 1.4.900
                        Profile: FULL_PROFILE
                        Version: OpenCL 1.1 ATI-Stream-v2.3 (451
                        )
                        Extensions: cl_khr_global_int32_base_atomic
                        s cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_lo
                        cal_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store
                        cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops c
                        l_amd_popcnt cl_khr_d3d10_sharing


                        Device Type: CL_DEVICE_TYPE_CPU
                        Device ID: 4098
                        Max compute units: 2
                        Max work items dimensions: 3
                        Max work items[0]: 1024
                        Max work items[1]: 1024
                        Max work items[2]: 1024
                        Max work group size: 1024
                        Preferred vector width char: 16
                        Preferred vector width short: 8
                        Preferred vector width int: 4
                        Preferred vector width long: 2
                        Preferred vector width float: 4
                        Preferred vector width double: 0
                        Max clock frequency: 4143Mhz
                        Address bits: 32
                        Max memory allocation: 536870912
                        Image support: No
                        Max size of kernel argument: 4096
                        Alignment (bits) of base address: 1024
                        Minimum alignment (bytes) for any datatype: 128
                        Single precision floating point capability
                        Denorms: Yes
                        Quiet NaNs: Yes
                        Round to nearest even: Yes
                        Round to zero: Yes
                        Round to +ve and infinity: Yes
                        IEEE754-2008 fused multiply-add: No
                        Cache type: Read/Write
                        Cache line size: 64
                        Cache size: 32768
                        Global memory size: 1073741824
                        Constant buffer size: 65536
                        Max number of constant args: 8
                        Local memory type: Global
                        Local memory size: 32768
                        Error correction support: 0
                        Profiling timer resolution: 247
                        Device endianess: Little
                        Available: Yes
                        Compiler available: Yes
                        Execution capabilities:
                        Execute OpenCL kernels: Yes
                        Execute native function: Yes
                        Queue properties:
                        Out-of-Order: No
                        Profiling : Yes
                        Platform ID: 0344A40C
                        Name: Intel(R) Core(TM)2 Duo CPU
                        E8500 @ 3.16GHz
                        Vendor: GenuineIntel
                        Driver version: 2.0
                        Profile: FULL_PROFILE
                        Version: OpenCL 1.1 ATI-Stream-v2.3 (451
                        )
                        Extensions: cl_amd_fp64 cl_khr_global_int32
                        _base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomi
                        cs cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_s
                        haring cl_ext_device_fission cl_amd_device_attribute_query cl_amd_media_ops cl_a
                        md_popcnt cl_amd_printf cl_khr_d3d10_sharing

                        2)
                        nd just to be different this new clInfo only picks up my 5670 and CPU no NV card:

                        Code:
                        E:\Downloads>CLInfo_no_OCL1_1.exe
                        Number of platforms: 1
                        Platform Profile: FULL_PROFILE
                        Platform Version: OpenCL 1.1 ATI-Stream-v2.3 (451
                        )
                        Platform Name: ATI Stream
                        Platform Vendor: Advanced Micro Devices, Inc.
                        Platform Extensions: cl_khr_icd cl_amd_event_callbac
                        k cl_amd_offline_devices cl_khr_d3d10_sharing


                        Platform Name: ATI Stream
                        Number of devices: 2
                        Device Type: CL_DEVICE_TYPE_GPU
                        Device ID: 4098
                        Max compute units: 5
                        Max work items dimensions: 3
                        Max work items[0]: 256
                        Max work items[1]: 256
                        Max work items[2]: 256
                        Max work group size: 256
                        Preferred vector width char: 16
                        Preferred vector width short: 8
                        Preferred vector width int: 4
                        Preferred vector width long: 2
                        Preferred vector width float: 4
                        Preferred vector width double: 0
                        Max clock frequency: 850Mhz
                        Address bits: 32
                        Max memory allocation: 134217728
                        Image support: Yes
                        Max number of images read arguments: 128
                        Max number of images write arguments: 8
                        Max image 2D width: 8192
                        Max image 2D height: 8192
                        Max image 3D width: 2048
                        Max image 3D height: 2048
                        Max image 3D depth: 2048
                        Max samplers within kernel: 16
                        Max size of kernel argument: 1024
                        Alignment (bits) of base address: 32768
                        Minimum alignment (bytes) for any datatype: 128
                        Single precision floating point capability
                        Denorms: No
                        Quiet NaNs: Yes
                        Round to nearest even: Yes
                        Round to zero: Yes
                        Round to +ve and infinity: Yes
                        IEEE754-2008 fused multiply-add: Yes
                        Cache type: None
                        Cache line size: 0
                        Cache size: 0
                        Global memory size: 536870912
                        Constant buffer size: 65536
                        Max number of constant args: 8
                        Local memory type: Scratchpad
                        Local memory size: 32768
                        Error correction support: 0
                        Profiling timer resolution: 1
                        Device endianess: Little
                        Available: Yes
                        Compiler available: Yes
                        Execution capabilities:
                        Execute OpenCL kernels: Yes
                        Execute native function: No
                        Queue properties:
                        Out-of-Order: No
                        Profiling : Yes
                        Platform ID: 02D1A40C
                        Name: Redwood
                        Vendor: Advanced Micro Devices, Inc.
                        Driver version: CAL 1.4.1016
                        Profile: FULL_PROFILE
                        Version: OpenCL 1.1 ATI-Stream-v2.3 (451
                        )
                        Extensions: cl_khr_global_int32_base_atomic
                        s cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_lo
                        cal_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store
                        cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops c
                        l_amd_popcnt cl_khr_d3d10_sharing


                        Device Type: CL_DEVICE_TYPE_CPU
                        Device ID: 4098
                        Max compute units: 6
                        Max work items dimensions: 3
                        Max work items[0]: 1024
                        Max work items[1]: 1024
                        Max work items[2]: 1024
                        Max work group size: 1024
                        Preferred vector width char: 16
                        Preferred vector width short: 8
                        Preferred vector width int: 4
                        Preferred vector width long: 2
                        Preferred vector width float: 4
                        Preferred vector width double: 0
                        Max clock frequency: 3200Mhz
                        Address bits: 32
                        Max memory allocation: 536870912
                        Image support: No
                        Max size of kernel argument: 4096
                        Alignment (bits) of base address: 1024
                        Minimum alignment (bytes) for any datatype: 128
                        Single precision floating point capability
                        Denorms: Yes
                        Quiet NaNs: Yes
                        Round to nearest even: Yes
                        Round to zero: Yes
                        Round to +ve and infinity: Yes
                        IEEE754-2008 fused multiply-add: No
                        Cache type: Read/Write
                        Cache line size: 64
                        Cache size: 65536
                        Global memory size: 1073741824
                        Constant buffer size: 65536
                        Max number of constant args: 8
                        Local memory type: Global
                        Local memory size: 32768
                        Error correction support: 0
                        Profiling timer resolution: 319
                        Device endianess: Little
                        Available: Yes
                        Compiler available: Yes
                        Execution capabilities:
                        Execute OpenCL kernels: Yes
                        Execute native function: Yes
                        Queue properties:
                        Out-of-Order: No
                        Profiling : Yes
                        Platform ID: 02D1A40C
                        Name: AMD Phenom(tm) II X6 1090T Proc
                        essor
                        Vendor: AuthenticAMD
                        Driver version: 2.0
                        Profile: FULL_PROFILE
                        Version: OpenCL 1.1 ATI-Stream-v2.3 (451
                        )
                        Extensions: cl_amd_fp64 cl_khr_global_int32
                        _base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomi
                        cs cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_s
                        haring cl_ext_device_fission cl_amd_device_attribute_query cl_amd_media_ops cl_a
                        md_popcnt cl_amd_printf cl_khr_d3d10_sharing

                        and the original CLInfo lists the 465 the 5670 and the CPU:

                        Code:
                        E:\Documents\ATI Stream\samples\opencl\bin\x86>CLInfo.exe
                        Number of platforms: 2
                        Platform Profile: FULL_PROFILE
                        Platform Version: OpenCL 1.1 ATI-Stream-v2.3 (451
                        )
                        Platform Name: ATI Stream
                        Platform Vendor: Advanced Micro Devices, Inc.
                        Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd
                        _offline_devices cl_khr_d3d10_sharing
                        Platform Profile: FULL_PROFILE
                        Platform Version: OpenCL 1.0 CUDA 3.2.1
                        Platform Name: NVIDIA CUDA
                        Platform Vendor: NVIDIA Corporation
                        Platform Extensions: cl_khr_byte_addressable_store cl_khr_ic
                        d cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing
                        cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pr
                        agma_unroll


                        Platform Name: ATI Stream
                        Number of devices: 2
                        Device Type: CL_DEVICE_TYPE_GPU
                        Device ID: 4098
                        Max compute units: 5
                        Max work items dimensions: 3
                        Max work items[0]: 256
                        Max work items[1]: 256
                        Max work items[2]: 256
                        Max work group size: 256
                        Preferred vector width char: 16
                        Preferred vector width short: 8
                        Preferred vector width int: 4
                        Preferred vector width long: 2
                        Preferred vector width float: 4
                        Preferred vector width double: 0
                        Max clock frequency: 850Mhz
                        Address bits: 32
                        Max memory allocation: 134217728
                        Image support: Yes
                        Max number of images read arguments: 128
                        Max number of images write arguments: 8
                        Max image 2D width: 8192
                        Max image 2D height: 8192
                        Max image 3D width: 2048
                        Max image 3D height: 2048
                        Max image 3D depth: 2048
                        Max samplers within kernel: 16
                        Max size of kernel argument: 1024
                        Alignment (bits) of base address: 32768
                        Minimum alignment (bytes) for any datatype: 128
                        Single precision floating point capability
                        Denorms: No
                        Quiet NaNs: Yes
                        Round to nearest even: Yes
                        Round to zero: Yes
                        Round to +ve and infinity: Yes
                        IEEE754-2008 fused multiply-add: Yes
                        Cache type: None
                        Cache line size: 0
                        Cache size: 0
                        Global memory size: 536870912
                        Constant buffer size: 65536
                        Max number of constant args: 8
                        Local memory type: Scratchpad
                        Local memory size: 32768
                        Profiling timer resolution: 1
                        Device endianess: Little
                        Available: Yes
                        Compiler available: Yes
                        Execution capabilities:
                        Execute OpenCL kernels: Yes
                        Execute native function: No
                        Queue properties:
                        Out-of-Order: No
                        Profiling : Yes
                        Platform ID: 0283A40C
                        Name: Redwood
                        Vendor: Advanced Micro Devices, Inc.
                        Driver version: CAL 1.4.1016
                        Profile: FULL_PROFILE
                        Version: OpenCL 1.1 ATI-Stream-v2.3 (451
                        )
                        Extensions: cl_khr_global_int32_base_atomic
                        s cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_lo
                        cal_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store
                        cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops c
                        l_amd_popcnt cl_khr_d3d10_sharing
                        Device Type: CL_DEVICE_TYPE_CPU
                        Device ID: 4098
                        Max compute units: 6
                        Max work items dimensions: 3
                        Max work items[0]: 1024
                        Max work items[1]: 1024
                        Max work items[2]: 1024
                        Max work group size: 1024
                        Preferred vector width char: 16
                        Preferred vector width short: 8
                        Preferred vector width int: 4
                        Preferred vector width long: 2
                        Preferred vector width float: 4
                        Preferred vector width double: 0
                        Max clock frequency: 3200Mhz
                        Address bits: 32
                        Max memory allocation: 536870912
                        Image support: No
                        Max size of kernel argument: 4096
                        Alignment (bits) of base address: 1024
                        Minimum alignment (bytes) for any datatype: 128
                        Single precision floating point capability
                        Denorms: Yes
                        Quiet NaNs: Yes
                        Round to nearest even: Yes
                        Round to zero: Yes
                        Round to +ve and infinity: Yes
                        IEEE754-2008 fused multiply-add: No
                        Cache type: Read/Write
                        Cache line size: 64
                        Cache size: 65536
                        Global memory size: 1073741824
                        Constant buffer size: 65536
                        Max number of constant args: 8
                        Local memory type: Global
                        Local memory size: 32768
                        Profiling timer resolution: 319
                        Device endianess: Little
                        Available: Yes
                        Compiler available: Yes
                        Execution capabilities:
                        Execute OpenCL kernels: Yes
                        Execute native function: Yes
                        Queue properties:
                        Out-of-Order: No
                        Profiling : Yes
                        Platform ID: 0283A40C
                        Name: AMD Phenom(tm) II X6 1090T Proc
                        essor
                        Vendor: AuthenticAMD
                        Driver version: 2.0
                        Profile: FULL_PROFILE
                        Version: OpenCL 1.1 ATI-Stream-v2.3 (451
                        )
                        Extensions: cl_amd_fp64 cl_khr_global_int32
                        _base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomi
                        cs cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_s
                        haring cl_ext_device_fission cl_amd_device_attribute_query cl_amd_media_ops cl_a
                        md_popcnt cl_amd_printf cl_khr_d3d10_sharing


                        Passed!
                        Platform Name: NVIDIA CUDA
                        Number of devices: 1
                        Device Type: CL_DEVICE_TYPE_GPU
                        Device ID: 4318
                        Max compute units: 11
                        Max work items dimensions: 3
                        Max work items[0]: 1024
                        Max work items[1]: 1024
                        Max work items[2]: 64
                        Max work group size: 1024
                        Preferred vector width char: 1
                        Preferred vector width short: 1
                        Preferred vector width int: 1
                        Preferred vector width long: 1
                        Preferred vector width float: 1
                        Preferred vector width double: 1
                        Max clock frequency: 1215Mhz
                        Address bits: 32
                        Max memory allocation: 260456448
                        Image support: Yes
                        Max number of images read arguments: 128
                        Max number of images write arguments: 8
                        Max image 2D width: 4096
                        Max image 2D height: 32768
                        Max image 3D width: 2048
                        Max image 3D height: 2048
                        Max image 3D depth: 2048
                        Max samplers within kernel: 16
                        Max size of kernel argument: 4352
                        Alignment (bits) of base address: 4096
                        Minimum alignment (bytes) for any datatype: 128
                        Single precision floating point capability
                        Denorms: Yes
                        Quiet NaNs: Yes
                        Round to nearest even: Yes
                        Round to zero: Yes
                        Round to +ve and infinity: Yes
                        IEEE754-2008 fused multiply-add: Yes
                        Cache type: Read/Write
                        Cache line size: 128
                        Cache size: 180224
                        Global memory size: 1041825792
                        Constant buffer size: 65536
                        Max number of constant args: 9
                        Local memory type: Scratchpad
                        Local memory size: 49152
                        Profiling timer resolution: 1000
                        Device endianess: Little
                        Available: Yes
                        Compiler available: Yes
                        Execution capabilities:
                        Execute OpenCL kernels: Yes
                        Execute native function: No
                        Queue properties:
                        Out-of-Order: Yes
                        Profiling : Yes
                        Platform ID: 003A0D88
                        Name: GeForce GTX 465
                        Vendor: NVIDIA Corporation
                        Driver version: 266.58
                        Profile: FULL_PROFILE
                        Version: OpenCL 1.0 CUDA
                        Extensions: cl_khr_byte_addressable_store c
                        l_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_
                        sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query
                        cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extend
                        ed_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics c
                        l_khr_fp64


                        Passed!

                        Any ideas why NV GPU disappeared after OpenCL 1.1 disabling?

                        • With NV and ATi GPU installed CLInfo refuses to list ATi GPU
                          Raistmer
                          prev post has cutted off NV part, here log continues:

                          Platform Name: NVIDIA CUDA
                          Number of devices: 1
                          Device Type: CL_DEVICE_TYPE_GPU
                          Device ID: 4318
                          Max compute units: 11
                          Max work items dimensions: 3
                          Max work items[0]: 1024
                          Max work items[1]: 1024
                          Max work items[2]: 64
                          Max work group size: 1024
                          Preferred vector width char: 1
                          Preferred vector width short: 1
                          Preferred vector width int: 1
                          Preferred vector width long: 1
                          Preferred vector width float: 1
                          Preferred vector width double: 1
                          Max clock frequency: 1215Mhz
                          Address bits: 32
                          Max memory allocation: 260456448
                          Image support: Yes
                          Max number of images read arguments: 128
                          Max number of images write arguments: 8
                          Max image 2D width: 4096
                          Max image 2D height: 32768
                          Max image 3D width: 2048
                          Max image 3D height: 2048
                          Max image 3D depth: 2048
                          Max samplers within kernel: 16
                          Max size of kernel argument: 4352
                          Alignment (bits) of base address: 4096
                          Minimum alignment (bytes) for any datatype: 128
                          Single precision floating point capability
                          Denorms: Yes
                          Quiet NaNs: Yes
                          Round to nearest even: Yes
                          Round to zero: Yes
                          Round to +ve and infinity: Yes
                          IEEE754-2008 fused multiply-add: Yes
                          Cache type: Read/Write
                          Cache line size: 128
                          Cache size: 180224
                          Global memory size: 1041825792
                          Constant buffer size: 65536
                          Max number of constant args: 9
                          Local memory type: Scratchpad
                          Local memory size: 49152
                          Profiling timer resolution: 1000
                          Device endianess: Little
                          Available: Yes
                          Compiler available: Yes
                          Execution capabilities:
                          Execute OpenCL kernels: Yes
                          Execute native function: No
                          Queue properties:
                          Out-of-Order: Yes
                          Profiling : Yes
                          Platform ID: 003A0D88
                          Name: GeForce GTX 465
                          Vendor: NVIDIA Corporation
                          Driver version: 266.58
                          Profile: FULL_PROFILE
                          Version: OpenCL 1.0 CUDA
                          Extensions: cl_khr_byte_addressable_store c
                          l_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_
                          sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query
                          cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extend
                          ed_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics c
                          l_khr_fp64


                          Passed!
                          • With NV and ATi GPU installed CLInfo refuses to list ATi GPU
                            Raistmer
                            Any ideas why with OpenCL 1.1 disabled CLInfo lost NV GPU ?
                            [So, situation now just reversed, ATi GPU listed, but NV missed]
                            • With NV and ATi GPU installed CLInfo refuses to list ATi GPU
                              Raistmer
                              I just disabled all CL_VERSION_1_1 entries in CLInfo.cpp So it should not matter if CL_VERSION_1_1 defined or not...

                              Code attached

                              int main(int argc, char** argv) { /* Error flag */ cl_int status = 0; /* Extensions verification flags */ bool isGpu = true; bool isVistaOrWin7 = false; #ifdef _WIN32 // Find the version of Windows OSVERSIONINFO vInfo; memset(&vInfo, 0, sizeof(vInfo)); vInfo.dwOSVersionInfoSize = sizeof(vInfo); if(!GetVersionEx(&vInfo)) { DWORD dwErr = GetLastError(); std::cout << "\nERROR : Unable to get Windows version information.\n" << std::endl; return 1; } if(vInfo.dwMajorVersion >= 6) { isVistaOrWin7 = true; } #endif /* Check if sample is run for cpu */ for(int i = 1; i < argc; i++) { if(!strcmp("cpu", argv[i])) isGpu = false; } cl_int err; // Platform info std::vector<cl::Platform> platforms; err = cl::Platform::get(&platforms); checkErr( err && (platforms.size() == 0 ? -1 : CL_SUCCESS), "cl::Platform::get()"); try { // Iteratate over platforms std::cout << "Number of platforms:\t\t\t\t " << platforms.size() << std::endl; for (std::vector<cl::Platform>::iterator i = platforms.begin(); i != platforms.end(); ++i) { std::cout << " Platform Profile:\t\t\t\t " << (*i).getInfo<CL_PLATFORM_PROFILE>().c_str() << std::endl; std::cout << " Platform Version:\t\t\t\t " << (*i).getInfo<CL_PLATFORM_VERSION>().c_str() << std::endl; std::cout << " Platform Name:\t\t\t\t " << (*i).getInfo<CL_PLATFORM_NAME>().c_str() << std::endl; std::cout << " Platform Vendor:\t\t\t\t " << (*i).getInfo<CL_PLATFORM_VENDOR>().c_str() << std::endl; if ((*i).getInfo<CL_PLATFORM_EXTENSIONS>().size() > 0) { std::cout << " Platform Extensions:\t\t\t\t " << (*i).getInfo<CL_PLATFORM_EXTENSIONS>().c_str() << std::endl; } } std::cout << std::endl << std:: endl; // Now Iteratate over each platform and its devices for (std::vector<cl::Platform>::iterator p = platforms.begin(); p != platforms.end(); ++p) { std::cout << " Platform Name:\t\t\t\t " << (*p).getInfo<CL_PLATFORM_NAME>().c_str() << std::endl; std::vector<cl::Device> devices; (*p).getDevices(CL_DEVICE_TYPE_ALL, &devices); std::cout << "Number of devices:\t\t\t\t " << devices.size() << std::endl; for (std::vector<cl::Device>::iterator i = devices.begin(); i != devices.end(); ++i) { /* Get device name */ cl::string deviceName = (*i).getInfo<CL_DEVICE_NAME>(); cl_device_type dtype = (*i).getInfo<CL_DEVICE_TYPE>(); /* Get CAL driver version in int */ cl::string driverVersion = (*i).getInfo<CL_DRIVER_VERSION>(); std::string calVersion(driverVersion.c_str()); calVersion = calVersion.substr(calVersion.find_last_of(".") + 1); int version = atoi(calVersion.c_str()); std::cout << " Device Type:\t\t\t\t\t " ; switch (dtype) { case CL_DEVICE_TYPE_ACCELERATOR: std::cout << "CL_DEVICE_TYPE_ACCRLERATOR" << std::endl; break; case CL_DEVICE_TYPE_CPU: std::cout << "CL_DEVICE_TYPE_CPU" << std::endl; break; case CL_DEVICE_TYPE_DEFAULT: std::cout << "CL_DEVICE_TYPE_DEFAULT" << std::endl; break; case CL_DEVICE_TYPE_GPU: std::cout << "CL_DEVICE_TYPE_GPU" << std::endl; break; } std::cout << " Device ID:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_VENDOR_ID>() << std::endl; std::cout << " Max compute units:\t\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_COMPUTE_UNITS>() << std::endl; std::cout << " Max work items dimensions:\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS>() << std::endl; std::vector< ::size_t> witems = (*i).getInfo<CL_DEVICE_MAX_WORK_ITEM_SIZES>(); for (unsigned int x = 0; x < (*i).getInfo<CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS>(); x++) { std::cout << " Max work items[" << x << "]:\t\t\t\t " << witems[x] << std::endl; } std::cout << " Max work group size:\t\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_WORK_GROUP_SIZE>() << std::endl; std::cout << " Preferred vector width char:\t\t\t " << (*i).getInfo<CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR>() << std::endl; std::cout << " Preferred vector width short:\t\t\t " << (*i).getInfo<CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT>() << std::endl; std::cout << " Preferred vector width int:\t\t\t " << (*i).getInfo<CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT>() << std::endl; std::cout << " Preferred vector width long:\t\t\t " << (*i).getInfo<CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG>() << std::endl; std::cout << " Preferred vector width float:\t\t\t " << (*i).getInfo<CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT>() << std::endl; std::cout << " Preferred vector width double:\t\t " << (*i).getInfo<CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE>() << std::endl; #if 0 //CL_VERSION_1_1 std::cout << " Native vector width char:\t\t\t " << (*i).getInfo<CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR>() << std::endl; std::cout << " Native vector width short:\t\t\t " << (*i).getInfo<CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT>() << std::endl; std::cout << " Native vector width int:\t\t\t " << (*i).getInfo<CL_DEVICE_NATIVE_VECTOR_WIDTH_INT>() << std::endl; std::cout << " Native vector width long:\t\t\t " << (*i).getInfo<CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG>() << std::endl; std::cout << " Native vector width float:\t\t\t " << (*i).getInfo<CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT>() << std::endl; std::cout << " Native vector width double:\t\t\t " << (*i).getInfo<CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE>() << std::endl; #endif // CL_VERSION_1_1 std::cout << " Max clock frequency:\t\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_CLOCK_FREQUENCY>() << "Mhz" << std::endl; std::cout << " Address bits:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_ADDRESS_BITS>() << std::endl; std::cout << " Max memory allocation:\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_MEM_ALLOC_SIZE>() << std::endl; std::cout << " Image support:\t\t\t\t " << ((*i).getInfo<CL_DEVICE_IMAGE_SUPPORT>() ? "Yes" : "No") << std::endl; if ((*i).getInfo<CL_DEVICE_IMAGE_SUPPORT>()) { std::cout << " Max number of images read arguments:\t\t " << (*i).getInfo<CL_DEVICE_MAX_READ_IMAGE_ARGS>() << std::endl; std::cout << " Max number of images write arguments:\t\t " << (*i).getInfo<CL_DEVICE_MAX_WRITE_IMAGE_ARGS>() << std::endl; std::cout << " Max image 2D width:\t\t\t\t " << (*i).getInfo<CL_DEVICE_IMAGE2D_MAX_WIDTH>() << std::endl; std::cout << " Max image 2D height:\t\t\t\t " << (*i).getInfo<CL_DEVICE_IMAGE2D_MAX_HEIGHT>() << std::endl; std::cout << " Max image 3D width:\t\t\t\t " << (*i).getInfo<CL_DEVICE_IMAGE3D_MAX_WIDTH>() << std::endl; std::cout << " Max image 3D height:\t\t\t\t " << (*i).getInfo<CL_DEVICE_IMAGE3D_MAX_HEIGHT>() << std::endl; std::cout << " Max image 3D depth:\t\t\t\t " << (*i).getInfo<CL_DEVICE_IMAGE3D_MAX_DEPTH>() << std::endl; std::cout << " Max samplers within kernel:\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_SAMPLERS>() << std::endl; } std::cout << " Max size of kernel argument:\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_PARAMETER_SIZE>() << std::endl; std::cout << " Alignment (bits) of base address:\t\t " << (*i).getInfo<CL_DEVICE_MEM_BASE_ADDR_ALIGN>() << std::endl; std::cout << " Minimum alignment (bytes) for any datatype:\t " << (*i).getInfo<CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE>() << std::endl; std::cout << " Single precision floating point capability" << std::endl; std::cout << " Denorms:\t\t\t\t\t " << ((*i).getInfo<CL_DEVICE_SINGLE_FP_CONFIG>() & CL_FP_DENORM ? "Yes" : "No") << std::endl; std::cout << " Quiet NaNs:\t\t\t\t\t " << ((*i).getInfo<CL_DEVICE_SINGLE_FP_CONFIG>() & CL_FP_INF_NAN ? "Yes" : "No") << std::endl; std::cout << " Round to nearest even:\t\t\t " << ((*i).getInfo<CL_DEVICE_SINGLE_FP_CONFIG>() & CL_FP_ROUND_TO_NEAREST ? "Yes" : "No") << std::endl; std::cout << " Round to zero:\t\t\t\t " << ((*i).getInfo<CL_DEVICE_SINGLE_FP_CONFIG>() & CL_FP_ROUND_TO_ZERO ? "Yes" : "No") << std::endl; std::cout << " Round to +ve and infinity:\t\t\t " << ((*i).getInfo<CL_DEVICE_SINGLE_FP_CONFIG>() & CL_FP_ROUND_TO_INF ? "Yes" : "No") << std::endl; std::cout << " IEEE754-2008 fused multiply-add:\t\t " << ((*i).getInfo<CL_DEVICE_SINGLE_FP_CONFIG>() & CL_FP_FMA ? "Yes" : "No") << std::endl; std::cout << " Cache type:\t\t\t\t\t " ; switch ((*i).getInfo<CL_DEVICE_GLOBAL_MEM_CACHE_TYPE>()) { case CL_NONE: std::cout << "None" << std::endl; break; case CL_READ_ONLY_CACHE: std::cout << "Read only" << std::endl; break; case CL_READ_WRITE_CACHE: std::cout << "Read/Write" << std::endl; break; } std::cout << " Cache line size:\t\t\t\t " << (*i).getInfo<CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE>() << std::endl; std::cout << " Cache size:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_GLOBAL_MEM_CACHE_SIZE>() << std::endl; std::cout << " Global memory size:\t\t\t\t " << (*i).getInfo<CL_DEVICE_GLOBAL_MEM_SIZE>() << std::endl; std::cout << " Constant buffer size:\t\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE>() << std::endl; std::cout << " Max number of constant args:\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_CONSTANT_ARGS>() << std::endl; std::cout << " Local memory type:\t\t\t\t " ; switch ((*i).getInfo<CL_DEVICE_LOCAL_MEM_TYPE>()) { case CL_LOCAL: std::cout << "Scratchpad" << std::endl; break; case CL_GLOBAL: std::cout << "Global" << std::endl; break; } std::cout << " Local memory size:\t\t\t\t " << (*i).getInfo<CL_DEVICE_LOCAL_MEM_SIZE>() << std::endl; #if 0 //CL_VERSION_1_1 cl_context_properties cps[3] = { CL_CONTEXT_PLATFORM, (cl_context_properties)(*p)(), 0 }; std::vector<cl::Device> device; device.push_back(*i); cl::Context context(device, cps, NULL, NULL, &err); if (err != CL_SUCCESS) { std::cerr << "Context::Context() failed (" << err << ")\n"; return 1; } std::string kernelStr("__kernel void hello(){ size_t i = get_global_id(0); size_t j = get_global_id(1);}"); cl::Program::Sources sources(1, std::make_pair(kernelStr.data(), kernelStr.size())); cl::Program program = cl::Program(context, sources, &err); if (err != CL_SUCCESS) { std::cerr << "Program::Program() failed (" << err << ")\n"; return 1; } err = program.build(device); if (err != CL_SUCCESS) { if(err == CL_BUILD_PROGRAM_FAILURE) { cl::string str = program.getBuildInfo<CL_PROGRAM_BUILD_LOG>((*i)); std::cout << " \n\t\t\tBUILD LOG\n"; std::cout << " ************************************************\n"; std::cout << str.c_str() << std::endl; std::cout << " ************************************************\n"; } std::cerr << "Program::build() failed (" << err << ")\n"; return 1; } cl::Kernel kernel(program, "hello", &err); if (err != CL_SUCCESS) { std::cerr << "Kernel::Kernel() failed (" << err << ")\n"; return 1; } std::cout << " Kernel Preferred work group size multiple:\t " << kernel.getWorkGroupInfo<CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE>((*i), &err) << std::endl; #endif // CL_VERSION_1_1 std::cout << " Error correction support:\t\t\t " << (*i).getInfo<CL_DEVICE_ERROR_CORRECTION_SUPPORT>() << std::endl; #if 0 //CL_VERSION_1_1 std::cout << " Unified memory for Host and Device:\t\t " << (*i).getInfo<CL_DEVICE_HOST_UNIFIED_MEMORY>() << std::endl; #endif // CL_VERSION_1_1 std::cout << " Profiling timer resolution:\t\t\t " << (*i).getInfo<CL_DEVICE_PROFILING_TIMER_RESOLUTION>() << std::endl; std::cout << " Device endianess:\t\t\t\t " << ((*i).getInfo<CL_DEVICE_ENDIAN_LITTLE>() ? "Little" : "Big") << std::endl; std::cout << " Available:\t\t\t\t\t " << ((*i).getInfo<CL_DEVICE_AVAILABLE>() ? "Yes" : "No") << std::endl; std::cout << " Compiler available:\t\t\t\t " << ((*i).getInfo<CL_DEVICE_COMPILER_AVAILABLE>() ? "Yes" : "No") << std::endl; std::cout << " Execution capabilities:\t\t\t\t " << std::endl; std::cout << " Execute OpenCL kernels:\t\t\t " << ((*i).getInfo<CL_DEVICE_EXECUTION_CAPABILITIES>() & CL_EXEC_KERNEL ? "Yes" : "No") << std::endl; std::cout << " Execute native function:\t\t\t " << ((*i).getInfo<CL_DEVICE_EXECUTION_CAPABILITIES>() & CL_EXEC_NATIVE_KERNEL ? "Yes" : "No") << std::endl; std::cout << " Queue properties:\t\t\t\t " << std::endl; std::cout << " Out-of-Order:\t\t\t\t " << ((*i).getInfo<CL_DEVICE_QUEUE_PROPERTIES>() & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE ? "Yes" : "No") << std::endl; std::cout << " Profiling :\t\t\t\t\t " << ((*i).getInfo<CL_DEVICE_QUEUE_PROPERTIES>() & CL_QUEUE_PROFILING_ENABLE ? "Yes" : "No") << std::endl; std::cout << " Platform ID:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_PLATFORM>() << std::endl; std::cout << " Name:\t\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_NAME>().c_str() << std::endl; std::cout << " Vendor:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_VENDOR>().c_str() << std::endl; #if 0 //CL_VERSION_1_1 //std::cout << " Device OpenCL C version:\t\t\t " // << (*i).getInfo<CL_DEVICE_OPENCL_C_VERSION>().c_str() // << std::endl; #endif // CL_VERSION_1_1 std::cout << " Driver version:\t\t\t\t " << (*i).getInfo<CL_DRIVER_VERSION>().c_str() << std::endl; std::cout << " Profile:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_PROFILE>().c_str() << std::endl; std::cout << " Version:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_VERSION>().c_str() << std::endl; std::cout << " Extensions:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_EXTENSIONS>().c_str() << std::endl; std::cout << std::endl << std::endl; } } } catch (cl::Error err) { std::cerr << "ERROR: " << err.what() << "(" << err.err() << ")" << std::endl; } return status; }

                              • With NV and ATi GPU installed CLInfo refuses to list ATi GPU
                                Raistmer
                                No, it works on some hosts and doesn't work on anothers. On each host results are reproducible I think.