cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Raistmer
Adept II

With NV and ATi GPU installed CLInfo refuses to list ATi GPU

-30 error reported...

Log:

C:\Program Files\ATI Stream\bin\x86>CLInfo.exe
Number of platforms: 2
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.0 CUDA 3.2.1
Platform Name: NVIDIA CUDA
Platform Vendor: NVIDIA Corporation
Platform Extensions: cl_khr_byte_addressable_store c
l_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_compiler_options cl_nv_devi
ce_attribute_query cl_nv_pragma_unroll
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.1 ATI-Stream-v2.3 (451
)
Platform Name: ATI Stream
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callbac
k cl_amd_offline_devices


Platform Name: NVIDIA CUDA
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4318
Max compute units: 4
Max work items dimensions: 3
Max work items[0]: 512
Max work items[1]: 512
Max work items[2]: 64
Max work group size: 512
Preferred vector width char: 1
Preferred vector width short: 1
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 0
ERROR: clgetDeviceInfo(-30)

0 Likes
15 Replies
nou
Exemplar

compile CLInfo without OpenCL 1.1 support

 

0 Likes
Raistmer
Adept II

How to disable OpenCL 1.1 ? Sometimes code from CLInfo sample gives same error in my own app, and I didn't enable OpenCL 1.1 anywhere....
0 Likes

in the code there is a #ifdef CL_1_1 or something like that. so just undef in CLInfo file.

0 Likes
Raistmer
Adept II

yes, there is #ifdef CL_VERSION_1_1
in original CLInfo sample, but I checked again, there is no this part in GPU detection code I borrowed from CLInfo in my app.
Still it reports -30 error on some NV+ATi configs (not all hosts are affected).
I'll try to rebuild CLinfo sample and report if it will fix issue on at least some affected hosts.
0 Likes

Hello Raistmer,

Actually you are using the device which doesn't have supoort of OpenCL1.1 that's why it is giving this error. Please upgrade your device for proper results.

Thanks

Ramandeep

0 Likes

Originally posted by: ramandee p

Hello Raistmer,




Actually you are using the device which doesn't have supoort of OpenCL1.1 that's why it is giving this error. Please upgrade your device for proper results.




Thanks




Ramandeep



Hm, there were no referencies that CLinfo binary provided by AMD should be used ONLY on OpenCL 1.1 devices. Moreover, it's wrong, CLInfo works OK on OpenCL 1.0 devices too.

And to answer all other suggestions of type "just get new GPU":
please, take into account that I develop open source application for BOINC framework. That is, it should run successfully on ALL OPENCL-SUPPORTED HARDWARE. I can't "get new GPU" cause not only I use this app but ponentially ~500K peoples around world. Quite large auditory to take it into account, not?

@nou
Thanks for suggestion, it work indeed, I recived such report:
"
The CLInfo.exe from APP SDK 2.3 gives me -30 error, but the OpenCL AP/MB application and this modified CLInfo.exe gives me a proper result.
Radeon HD 4850 + GeForce GTX 260, both with current (as at February 2011) drivers installed
"
Now I have to reach those who experienced -30 error with app itself to provide more info on topic...
0 Likes

you just can't use OpenCL 1.1 functionality. CLInfo is by default compiled with 1.1 functionality. and just after Preffered vector width it query CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR which is from OpenCL 1.1. proper way should be determine what info to query at a runtime not at compilation time. and as nVidia don't support CL 1.1 we are constrined to 1.0.

0 Likes

Originally posted by: nou you just can't use OpenCL 1.1 functionality. CLInfo is by default compiled with 1.1 functionality. and just after Preffered vector width it query CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR which is from OpenCL 1.1. proper way should be determine what info to query at a runtime not at compilation time. and as nVidia don't support CL 1.1 we are constrined to 1.0.

 

Code will not be compiled if both runtime query and OpenCL 1.0 headers used.

we will make sure that in both cases sample behaviour is proper.  Thanks for idea.

0 Likes
Raistmer
Adept II

Thanks again. But looks like not all so "easy" with NV/ATi combo.
Now I recived few reports from different testers.
I'll list some of them to illustrate that even with disabled OpenCL 1.1 CLinfo sometimes fails to list NV GPU.

1)
Modded CLinfo works fine:

Code:
Number of platforms: 2
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.0 CUDA 3.2.1
Platform Name: NVIDIA CUDA
Platform Vendor: NVIDIA Corporation
Platform Extensions: cl_khr_byte_addressable_store c
l_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_
sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query
cl_nv_pragma_unroll
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.1 ATI-Stream-v2.3 (451
)
Platform Name: ATI Stream
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callbac
k cl_amd_offline_devices cl_khr_d3d10_sharing


Platform Name: NVIDIA CUDA
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4318
Max compute units: 7
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 64
Max work group size: 1024
Preferred vector width char: 1
Preferred vector width short: 1
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Max clock frequency: 1600Mhz
Address bits: 14757395255531667488
Max memory allocation: 260423680
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 4096
Max image 2D height: 32768
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 4352
Alignment (bits) of base address: 4096
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 128
Cache size: 114688
Global memory size: 1041694720
Constant buffer size: 65536
Max number of constant args: 9
Local memory type: Scratchpad
Local memory size: 49152
Error correction support: 0
Profiling timer resolution: 1000
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 003E0D78
Name: GeForce GTX 460
Vendor: NVIDIA Corporation
Driver version: 266.58
Profile: FULL_PROFILE
Version: OpenCL 1.0 CUDA
Extensions: cl_khr_byte_addressable_store c
l_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_
sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query
cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extend
ed_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics c
l_khr_fp64


Platform Name: ATI Stream
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4098
Max compute units: 10
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Max clock frequency: 850Mhz
Address bits: 32
Max memory allocation: 134217728
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 32768
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 536870912
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Error correction support: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0344A40C
Name: Juniper
Vendor: Advanced Micro Devices, Inc.
Driver version: CAL 1.4.900
Profile: FULL_PROFILE
Version: OpenCL 1.1 ATI-Stream-v2.3 (451
)
Extensions: cl_khr_global_int32_base_atomic
s cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_lo
cal_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store
cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops c
l_amd_popcnt cl_khr_d3d10_sharing


Device Type: CL_DEVICE_TYPE_CPU
Device ID: 4098
Max compute units: 2
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Max clock frequency: 4143Mhz
Address bits: 32
Max memory allocation: 536870912
Image support: No
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: No
Cache type: Read/Write
Cache line size: 64
Cache size: 32768
Global memory size: 1073741824
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Error correction support: 0
Profiling timer resolution: 247
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0344A40C
Name: Intel(R) Core(TM)2 Duo CPU
E8500 @ 3.16GHz
Vendor: GenuineIntel
Driver version: 2.0
Profile: FULL_PROFILE
Version: OpenCL 1.1 ATI-Stream-v2.3 (451
)
Extensions: cl_amd_fp64 cl_khr_global_int32
_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomi
cs cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_s
haring cl_ext_device_fission cl_amd_device_attribute_query cl_amd_media_ops cl_a
md_popcnt cl_amd_printf cl_khr_d3d10_sharing

2)
nd just to be different this new clInfo only picks up my 5670 and CPU no NV card:

Code:
E:\Downloads>CLInfo_no_OCL1_1.exe
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.1 ATI-Stream-v2.3 (451
)
Platform Name: ATI Stream
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callbac
k cl_amd_offline_devices cl_khr_d3d10_sharing


Platform Name: ATI Stream
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4098
Max compute units: 5
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Max clock frequency: 850Mhz
Address bits: 32
Max memory allocation: 134217728
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 32768
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 536870912
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Error correction support: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 02D1A40C
Name: Redwood
Vendor: Advanced Micro Devices, Inc.
Driver version: CAL 1.4.1016
Profile: FULL_PROFILE
Version: OpenCL 1.1 ATI-Stream-v2.3 (451
)
Extensions: cl_khr_global_int32_base_atomic
s cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_lo
cal_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store
cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops c
l_amd_popcnt cl_khr_d3d10_sharing


Device Type: CL_DEVICE_TYPE_CPU
Device ID: 4098
Max compute units: 6
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Max clock frequency: 3200Mhz
Address bits: 32
Max memory allocation: 536870912
Image support: No
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: No
Cache type: Read/Write
Cache line size: 64
Cache size: 65536
Global memory size: 1073741824
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Error correction support: 0
Profiling timer resolution: 319
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 02D1A40C
Name: AMD Phenom(tm) II X6 1090T Proc
essor
Vendor: AuthenticAMD
Driver version: 2.0
Profile: FULL_PROFILE
Version: OpenCL 1.1 ATI-Stream-v2.3 (451
)
Extensions: cl_amd_fp64 cl_khr_global_int32
_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomi
cs cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_s
haring cl_ext_device_fission cl_amd_device_attribute_query cl_amd_media_ops cl_a
md_popcnt cl_amd_printf cl_khr_d3d10_sharing

and the original CLInfo lists the 465 the 5670 and the CPU:

Code:
E:\Documents\ATI Stream\samples\opencl\bin\x86>CLInfo.exe
Number of platforms: 2
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.1 ATI-Stream-v2.3 (451
)
Platform Name: ATI Stream
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd
_offline_devices cl_khr_d3d10_sharing
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.0 CUDA 3.2.1
Platform Name: NVIDIA CUDA
Platform Vendor: NVIDIA Corporation
Platform Extensions: cl_khr_byte_addressable_store cl_khr_ic
d cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing
cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pr
agma_unroll


Platform Name: ATI Stream
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4098
Max compute units: 5
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Max clock frequency: 850Mhz
Address bits: 32
Max memory allocation: 134217728
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 32768
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 536870912
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0283A40C
Name: Redwood
Vendor: Advanced Micro Devices, Inc.
Driver version: CAL 1.4.1016
Profile: FULL_PROFILE
Version: OpenCL 1.1 ATI-Stream-v2.3 (451
)
Extensions: cl_khr_global_int32_base_atomic
s cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_lo
cal_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store
cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops c
l_amd_popcnt cl_khr_d3d10_sharing
Device Type: CL_DEVICE_TYPE_CPU
Device ID: 4098
Max compute units: 6
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Max clock frequency: 3200Mhz
Address bits: 32
Max memory allocation: 536870912
Image support: No
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: No
Cache type: Read/Write
Cache line size: 64
Cache size: 65536
Global memory size: 1073741824
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Profiling timer resolution: 319
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0283A40C
Name: AMD Phenom(tm) II X6 1090T Proc
essor
Vendor: AuthenticAMD
Driver version: 2.0
Profile: FULL_PROFILE
Version: OpenCL 1.1 ATI-Stream-v2.3 (451
)
Extensions: cl_amd_fp64 cl_khr_global_int32
_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomi
cs cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_s
haring cl_ext_device_fission cl_amd_device_attribute_query cl_amd_media_ops cl_a
md_popcnt cl_amd_printf cl_khr_d3d10_sharing


Passed!
Platform Name: NVIDIA CUDA
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4318
Max compute units: 11
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 64
Max work group size: 1024
Preferred vector width char: 1
Preferred vector width short: 1
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Max clock frequency: 1215Mhz
Address bits: 32
Max memory allocation: 260456448
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 4096
Max image 2D height: 32768
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 4352
Alignment (bits) of base address: 4096
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 128
Cache size: 180224
Global memory size: 1041825792
Constant buffer size: 65536
Max number of constant args: 9
Local memory type: Scratchpad
Local memory size: 49152
Profiling timer resolution: 1000
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 003A0D88
Name: GeForce GTX 465
Vendor: NVIDIA Corporation
Driver version: 266.58
Profile: FULL_PROFILE
Version: OpenCL 1.0 CUDA
Extensions: cl_khr_byte_addressable_store c
l_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_
sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query
cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extend
ed_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics c
l_khr_fp64


Passed!

Any ideas why NV GPU disappeared after OpenCL 1.1 disabling?

0 Likes
Raistmer
Adept II

prev post has cutted off NV part, here log continues:

Platform Name: NVIDIA CUDA
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4318
Max compute units: 11
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 64
Max work group size: 1024
Preferred vector width char: 1
Preferred vector width short: 1
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Max clock frequency: 1215Mhz
Address bits: 32
Max memory allocation: 260456448
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 4096
Max image 2D height: 32768
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 4352
Alignment (bits) of base address: 4096
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 128
Cache size: 180224
Global memory size: 1041825792
Constant buffer size: 65536
Max number of constant args: 9
Local memory type: Scratchpad
Local memory size: 49152
Profiling timer resolution: 1000
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 003A0D88
Name: GeForce GTX 465
Vendor: NVIDIA Corporation
Driver version: 266.58
Profile: FULL_PROFILE
Version: OpenCL 1.0 CUDA
Extensions: cl_khr_byte_addressable_store c
l_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_
sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query
cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extend
ed_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics c
l_khr_fp64


Passed!
0 Likes
Raistmer
Adept II

Any ideas why with OpenCL 1.1 disabled CLInfo lost NV GPU ?
[So, situation now just reversed, ATi GPU listed, but NV missed]
0 Likes

Originally posted by: Raistmer Any ideas why with OpenCL 1.1 disabled CLInfo lost NV GPU ? [So, situation now just reversed, ATi GPU listed, but NV missed]


I didn't understand why Nvidia device details are not listed. Could you please copy your code here.  Make sure #undef CL_VERSION_1_1 should be after all header files.

0 Likes
Raistmer
Adept II

I just disabled all CL_VERSION_1_1 entries in CLInfo.cpp So it should not matter if CL_VERSION_1_1 defined or not...

Code attached

int main(int argc, char** argv) { /* Error flag */ cl_int status = 0; /* Extensions verification flags */ bool isGpu = true; bool isVistaOrWin7 = false; #ifdef _WIN32 // Find the version of Windows OSVERSIONINFO vInfo; memset(&vInfo, 0, sizeof(vInfo)); vInfo.dwOSVersionInfoSize = sizeof(vInfo); if(!GetVersionEx(&vInfo)) { DWORD dwErr = GetLastError(); std::cout << "\nERROR : Unable to get Windows version information.\n" << std::endl; return 1; } if(vInfo.dwMajorVersion >= 6) { isVistaOrWin7 = true; } #endif /* Check if sample is run for cpu */ for(int i = 1; i < argc; i++) { if(!strcmp("cpu", argv)) isGpu = false; } cl_int err; // Platform info std::vector<cl::Platform> platforms; err = cl::Platform::get(&platforms); checkErr( err && (platforms.size() == 0 ? -1 : CL_SUCCESS), "cl::Platform::get()"); try { // Iteratate over platforms std::cout << "Number of platforms:\t\t\t\t " << platforms.size() << std::endl; for (std::vector<cl::Platform>::iterator i = platforms.begin(); i != platforms.end(); ++i) { std::cout << " Platform Profile:\t\t\t\t " << (*i).getInfo<CL_PLATFORM_PROFILE>().c_str() << std::endl; std::cout << " Platform Version:\t\t\t\t " << (*i).getInfo<CL_PLATFORM_VERSION>().c_str() << std::endl; std::cout << " Platform Name:\t\t\t\t " << (*i).getInfo<CL_PLATFORM_NAME>().c_str() << std::endl; std::cout << " Platform Vendor:\t\t\t\t " << (*i).getInfo<CL_PLATFORM_VENDOR>().c_str() << std::endl; if ((*i).getInfo<CL_PLATFORM_EXTENSIONS>().size() > 0) { std::cout << " Platform Extensions:\t\t\t\t " << (*i).getInfo<CL_PLATFORM_EXTENSIONS>().c_str() << std::endl; } } std::cout << std::endl << std:: endl; // Now Iteratate over each platform and its devices for (std::vector<cl::Platform>::iterator p = platforms.begin(); p != platforms.end(); ++p) { std::cout << " Platform Name:\t\t\t\t " << (*p).getInfo<CL_PLATFORM_NAME>().c_str() << std::endl; std::vector<cl::Device> devices; (*p).getDevices(CL_DEVICE_TYPE_ALL, &devices); std::cout << "Number of devices:\t\t\t\t " << devices.size() << std::endl; for (std::vector<cl::Device>::iterator i = devices.begin(); i != devices.end(); ++i) { /* Get device name */ cl::string deviceName = (*i).getInfo<CL_DEVICE_NAME>(); cl_device_type dtype = (*i).getInfo<CL_DEVICE_TYPE>(); /* Get CAL driver version in int */ cl::string driverVersion = (*i).getInfo<CL_DRIVER_VERSION>(); std::string calVersion(driverVersion.c_str()); calVersion = calVersion.substr(calVersion.find_last_of(".") + 1); int version = atoi(calVersion.c_str()); std::cout << " Device Type:\t\t\t\t\t " ; switch (dtype) { case CL_DEVICE_TYPE_ACCELERATOR: std::cout << "CL_DEVICE_TYPE_ACCRLERATOR" << std::endl; break; case CL_DEVICE_TYPE_CPU: std::cout << "CL_DEVICE_TYPE_CPU" << std::endl; break; case CL_DEVICE_TYPE_DEFAULT: std::cout << "CL_DEVICE_TYPE_DEFAULT" << std::endl; break; case CL_DEVICE_TYPE_GPU: std::cout << "CL_DEVICE_TYPE_GPU" << std::endl; break; } std::cout << " Device ID:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_VENDOR_ID>() << std::endl; std::cout << " Max compute units:\t\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_COMPUTE_UNITS>() << std::endl; std::cout << " Max work items dimensions:\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS>() << std::endl; std::vector< ::size_t> witems = (*i).getInfo<CL_DEVICE_MAX_WORK_ITEM_SIZES>(); for (unsigned int x = 0; x < (*i).getInfo<CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS>(); x++) { std::cout << " Max work items[" << x << "]:\t\t\t\t " << witems << std::endl; } std::cout << " Max work group size:\t\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_WORK_GROUP_SIZE>() << std::endl; std::cout << " Preferred vector width char:\t\t\t " << (*i).getInfo<CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR>() << std::endl; std::cout << " Preferred vector width short:\t\t\t " << (*i).getInfo<CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT>() << std::endl; std::cout << " Preferred vector width int:\t\t\t " << (*i).getInfo<CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT>() << std::endl; std::cout << " Preferred vector width long:\t\t\t " << (*i).getInfo<CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG>() << std::endl; std::cout << " Preferred vector width float:\t\t\t " << (*i).getInfo<CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT>() << std::endl; std::cout << " Preferred vector width double:\t\t " << (*i).getInfo<CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE>() << std::endl; #if 0 //CL_VERSION_1_1 std::cout << " Native vector width char:\t\t\t " << (*i).getInfo<CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR>() << std::endl; std::cout << " Native vector width short:\t\t\t " << (*i).getInfo<CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT>() << std::endl; std::cout << " Native vector width int:\t\t\t " << (*i).getInfo<CL_DEVICE_NATIVE_VECTOR_WIDTH_INT>() << std::endl; std::cout << " Native vector width long:\t\t\t " << (*i).getInfo<CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG>() << std::endl; std::cout << " Native vector width float:\t\t\t " << (*i).getInfo<CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT>() << std::endl; std::cout << " Native vector width double:\t\t\t " << (*i).getInfo<CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE>() << std::endl; #endif // CL_VERSION_1_1 std::cout << " Max clock frequency:\t\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_CLOCK_FREQUENCY>() << "Mhz" << std::endl; std::cout << " Address bits:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_ADDRESS_BITS>() << std::endl; std::cout << " Max memory allocation:\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_MEM_ALLOC_SIZE>() << std::endl; std::cout << " Image support:\t\t\t\t " << ((*i).getInfo<CL_DEVICE_IMAGE_SUPPORT>() ? "Yes" : "No") << std::endl; if ((*i).getInfo<CL_DEVICE_IMAGE_SUPPORT>()) { std::cout << " Max number of images read arguments:\t\t " << (*i).getInfo<CL_DEVICE_MAX_READ_IMAGE_ARGS>() << std::endl; std::cout << " Max number of images write arguments:\t\t " << (*i).getInfo<CL_DEVICE_MAX_WRITE_IMAGE_ARGS>() << std::endl; std::cout << " Max image 2D width:\t\t\t\t " << (*i).getInfo<CL_DEVICE_IMAGE2D_MAX_WIDTH>() << std::endl; std::cout << " Max image 2D height:\t\t\t\t " << (*i).getInfo<CL_DEVICE_IMAGE2D_MAX_HEIGHT>() << std::endl; std::cout << " Max image 3D width:\t\t\t\t " << (*i).getInfo<CL_DEVICE_IMAGE3D_MAX_WIDTH>() << std::endl; std::cout << " Max image 3D height:\t\t\t\t " << (*i).getInfo<CL_DEVICE_IMAGE3D_MAX_HEIGHT>() << std::endl; std::cout << " Max image 3D depth:\t\t\t\t " << (*i).getInfo<CL_DEVICE_IMAGE3D_MAX_DEPTH>() << std::endl; std::cout << " Max samplers within kernel:\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_SAMPLERS>() << std::endl; } std::cout << " Max size of kernel argument:\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_PARAMETER_SIZE>() << std::endl; std::cout << " Alignment (bits) of base address:\t\t " << (*i).getInfo<CL_DEVICE_MEM_BASE_ADDR_ALIGN>() << std::endl; std::cout << " Minimum alignment (bytes) for any datatype:\t " << (*i).getInfo<CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE>() << std::endl; std::cout << " Single precision floating point capability" << std::endl; std::cout << " Denorms:\t\t\t\t\t " << ((*i).getInfo<CL_DEVICE_SINGLE_FP_CONFIG>() & CL_FP_DENORM ? "Yes" : "No") << std::endl; std::cout << " Quiet NaNs:\t\t\t\t\t " << ((*i).getInfo<CL_DEVICE_SINGLE_FP_CONFIG>() & CL_FP_INF_NAN ? "Yes" : "No") << std::endl; std::cout << " Round to nearest even:\t\t\t " << ((*i).getInfo<CL_DEVICE_SINGLE_FP_CONFIG>() & CL_FP_ROUND_TO_NEAREST ? "Yes" : "No") << std::endl; std::cout << " Round to zero:\t\t\t\t " << ((*i).getInfo<CL_DEVICE_SINGLE_FP_CONFIG>() & CL_FP_ROUND_TO_ZERO ? "Yes" : "No") << std::endl; std::cout << " Round to +ve and infinity:\t\t\t " << ((*i).getInfo<CL_DEVICE_SINGLE_FP_CONFIG>() & CL_FP_ROUND_TO_INF ? "Yes" : "No") << std::endl; std::cout << " IEEE754-2008 fused multiply-add:\t\t " << ((*i).getInfo<CL_DEVICE_SINGLE_FP_CONFIG>() & CL_FP_FMA ? "Yes" : "No") << std::endl; std::cout << " Cache type:\t\t\t\t\t " ; switch ((*i).getInfo<CL_DEVICE_GLOBAL_MEM_CACHE_TYPE>()) { case CL_NONE: std::cout << "None" << std::endl; break; case CL_READ_ONLY_CACHE: std::cout << "Read only" << std::endl; break; case CL_READ_WRITE_CACHE: std::cout << "Read/Write" << std::endl; break; } std::cout << " Cache line size:\t\t\t\t " << (*i).getInfo<CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE>() << std::endl; std::cout << " Cache size:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_GLOBAL_MEM_CACHE_SIZE>() << std::endl; std::cout << " Global memory size:\t\t\t\t " << (*i).getInfo<CL_DEVICE_GLOBAL_MEM_SIZE>() << std::endl; std::cout << " Constant buffer size:\t\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE>() << std::endl; std::cout << " Max number of constant args:\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_CONSTANT_ARGS>() << std::endl; std::cout << " Local memory type:\t\t\t\t " ; switch ((*i).getInfo<CL_DEVICE_LOCAL_MEM_TYPE>()) { case CL_LOCAL: std::cout << "Scratchpad" << std::endl; break; case CL_GLOBAL: std::cout << "Global" << std::endl; break; } std::cout << " Local memory size:\t\t\t\t " << (*i).getInfo<CL_DEVICE_LOCAL_MEM_SIZE>() << std::endl; #if 0 //CL_VERSION_1_1 cl_context_properties cps[3] = { CL_CONTEXT_PLATFORM, (cl_context_properties)(*p)(), 0 }; std::vector<cl::Device> device; device.push_back(*i); cl::Context context(device, cps, NULL, NULL, &err); if (err != CL_SUCCESS) { std::cerr << "Context::Context() failed (" << err << ")\n"; return 1; } std::string kernelStr("__kernel void hello(){ size_t i = get_global_id(0); size_t j = get_global_id(1);}"); cl::Program::Sources sources(1, std::make_pair(kernelStr.data(), kernelStr.size())); cl::Program program = cl::Program(context, sources, &err); if (err != CL_SUCCESS) { std::cerr << "Program::Program() failed (" << err << ")\n"; return 1; } err = program.build(device); if (err != CL_SUCCESS) { if(err == CL_BUILD_PROGRAM_FAILURE) { cl::string str = program.getBuildInfo<CL_PROGRAM_BUILD_LOG>((*i)); std::cout << " \n\t\t\tBUILD LOG\n"; std::cout << " ************************************************\n"; std::cout << str.c_str() << std::endl; std::cout << " ************************************************\n"; } std::cerr << "Program::build() failed (" << err << ")\n"; return 1; } cl::Kernel kernel(program, "hello", &err); if (err != CL_SUCCESS) { std::cerr << "Kernel::Kernel() failed (" << err << ")\n"; return 1; } std::cout << " Kernel Preferred work group size multiple:\t " << kernel.getWorkGroupInfo<CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE>((*i), &err) << std::endl; #endif // CL_VERSION_1_1 std::cout << " Error correction support:\t\t\t " << (*i).getInfo<CL_DEVICE_ERROR_CORRECTION_SUPPORT>() << std::endl; #if 0 //CL_VERSION_1_1 std::cout << " Unified memory for Host and Device:\t\t " << (*i).getInfo<CL_DEVICE_HOST_UNIFIED_MEMORY>() << std::endl; #endif // CL_VERSION_1_1 std::cout << " Profiling timer resolution:\t\t\t " << (*i).getInfo<CL_DEVICE_PROFILING_TIMER_RESOLUTION>() << std::endl; std::cout << " Device endianess:\t\t\t\t " << ((*i).getInfo<CL_DEVICE_ENDIAN_LITTLE>() ? "Little" : "Big") << std::endl; std::cout << " Available:\t\t\t\t\t " << ((*i).getInfo<CL_DEVICE_AVAILABLE>() ? "Yes" : "No") << std::endl; std::cout << " Compiler available:\t\t\t\t " << ((*i).getInfo<CL_DEVICE_COMPILER_AVAILABLE>() ? "Yes" : "No") << std::endl; std::cout << " Execution capabilities:\t\t\t\t " << std::endl; std::cout << " Execute OpenCL kernels:\t\t\t " << ((*i).getInfo<CL_DEVICE_EXECUTION_CAPABILITIES>() & CL_EXEC_KERNEL ? "Yes" : "No") << std::endl; std::cout << " Execute native function:\t\t\t " << ((*i).getInfo<CL_DEVICE_EXECUTION_CAPABILITIES>() & CL_EXEC_NATIVE_KERNEL ? "Yes" : "No") << std::endl; std::cout << " Queue properties:\t\t\t\t " << std::endl; std::cout << " Out-of-Order:\t\t\t\t " << ((*i).getInfo<CL_DEVICE_QUEUE_PROPERTIES>() & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE ? "Yes" : "No") << std::endl; std::cout << " Profiling :\t\t\t\t\t " << ((*i).getInfo<CL_DEVICE_QUEUE_PROPERTIES>() & CL_QUEUE_PROFILING_ENABLE ? "Yes" : "No") << std::endl; std::cout << " Platform ID:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_PLATFORM>() << std::endl; std::cout << " Name:\t\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_NAME>().c_str() << std::endl; std::cout << " Vendor:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_VENDOR>().c_str() << std::endl; #if 0 //CL_VERSION_1_1 //std::cout << " Device OpenCL C version:\t\t\t " // << (*i).getInfo<CL_DEVICE_OPENCL_C_VERSION>().c_str() // << std::endl; #endif // CL_VERSION_1_1 std::cout << " Driver version:\t\t\t\t " << (*i).getInfo<CL_DRIVER_VERSION>().c_str() << std::endl; std::cout << " Profile:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_PROFILE>().c_str() << std::endl; std::cout << " Version:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_VERSION>().c_str() << std::endl; std::cout << " Extensions:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_EXTENSIONS>().c_str() << std::endl; std::cout << std::endl << std::endl; } } } catch (cl::Error err) { std::cerr << "ERROR: " << err.what() << "(" << err.err() << ")" << std::endl; } return status; }

0 Likes

sometime it work and sometime not? then ICD most likely fail load a nVidia platform.

0 Likes
Raistmer
Adept II

No, it works on some hosts and doesn't work on anothers. On each host results are reproducible I think.
0 Likes