Hello AMD OpenCL Gurus. I am facing a problem when building and running an opencl example. Here are details of my setup:
a.) I installed amdgpu-pro-install --opencl=legacy --headless
b.) I get the output from clinfo as
Output from clinfo (which w
Number of platforms: | | | 1 |
Platform Profile: | | | FULL_PROFILE |
Platform Version: | | | OpenCL 2.1 AMD-APP (2639.3) |
Platform Name: | | | AMD Accelerated Parallel Processing |
Platform Vendor: | | | Advanced Micro Devices, Inc. |
Platform Extensions: | | | cl_khr_icd cl_amd_event_callback cl_amd_offline_devices |
Platform Name: | | | AMD Accelerated Parallel Processing |
Number of devices: | | | 1 |
Device Type: | | | | CL_DEVICE_TYPE_GPU |
Vendor ID: | | | | 1002h |
Board name: | | | | AMD Radeon (TM) R5 M340 |
Device Topology: | | | PCI[ B#1, D#0, F#0 ] |
Max compute units: | | | 5 |
Max work items dimensions: | | 3 |
| Max work items[0]: | | | 1024 |
| Max work items[1]: | | | 1024 |
| Max work items[2]: | | | 1024 |
Max work group size: | | | 256 |
Preferred vector width char: | | 4 |
Preferred vector width short: | | 2 |
Preferred vector width int: | | 1 |
Preferred vector width long: | | 1 |
Preferred vector width float: | | 1 |
Preferred vector width double: | 1 |
Native vector width char: | | 4 |
Native vector width short: | | 2 |
Native vector width int: | | 1 |
Native vector width long: | | 1 |
Native vector width float: | | 1 |
Native vector width double: | | 1 |
Max clock frequency: | | | 750Mhz |
Address bits: | | | | 64 |
Max memory allocation: | | 1596905472 |
Image support: | | | Yes |
Max number of images read arguments: | 128 |
Max number of images write arguments: | 8 |
Max image 2D width: | | | 16384 |
Max image 2D height: | | | 16384 |
Max image 3D width: | | | 2048 |
Max image 3D height: | | | 2048 |
Max image 3D depth: | | | 2048 |
Max samplers within kernel: | | 16 |
Max size of kernel argument: | | 1024 |
Alignment (bits) of base address: | 2048 |
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
| Denorms: | | | | No |
| Quiet NaNs: | | | | Yes |
| Round to nearest even: | | Yes |
| Round to zero: | | | Yes |
| Round to +ve and infinity: | | Yes |
| IEEE754-2008 fused multiply-add: | Yes |
Cache type: | | | | Read/Write |
Cache line size: | | | 64 |
Cache size: | | | | 16384 |
Global memory size: | | | 2146349056 |
Constant buffer size: | | | 65536 |
Max number of constant args: | | 8 |
Local memory type: | | | Scratchpad |
Local memory size: | | | 32768 |
Max pipe arguments: | | | 0 |
Max pipe active reservations: | | 0 |
Max pipe packet size: | | | 0 |
Max global variable size: | | 0 |
Max global variable preferred total size: 0
Max read/write image args: | | 0 |
Max on device events: | | | 0 |
Queue on device max size: | | 0 |
Max on device queues: | | | 0 |
Queue on device preferred size: | 0 |
SVM capabilities: | | | |
| Coarse grain buffer: | | No |
| Fine grain buffer: | | | No |
| Fine grain system: | | | No |
| Atomics: | | | | No |
Preferred platform atomic alignment: | 0 |
Preferred global atomic alignment: | 0 |
Preferred local atomic alignment: | 0 |
Kernel Preferred work group size multiple: 64
Error correction support: | | 0 |
Unified memory for Host and Device: | 0 |
Profiling timer resolution: | | 1 |
Device endianess: | | | Little |
Available: | | | | Yes |
Compiler available: | | | Yes |
Execution capabilities: | | | |
| Execute OpenCL kernels: | | Yes |
| Execute native function: | | No |
Queue on Host properties: | | | |
| Out-of-Order: | | | No |
| Profiling : | | | | Yes |
Queue on Device properties: | | | |
| Out-of-Order: | | | No |
| Profiling : | | | | No |
Platform ID: | | | | 0x7fdcfda149f0 |
Name: | | | | | Hainan |
Vendor: | | | | Advanced Micro Devices, Inc. |
Device OpenCL C version: | | OpenCL C 1.2 |
Driver version: | | | 2639.3 |
Profile: | | | | FULL_PROFILE |
Version: | | | | OpenCL 1.2 AMD-APP (2639.3) |
Extensions: | | | | cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event |
c.) As per the AMD website, this card is supported.
d.) I build the helloworld example from
https://raw.githubusercontent.com/bgaster/opencl-book-samples/master/src/Chapter_2/HelloWorld/HelloW...
e.) So far so good. However when I try to run the example I get the following bugs:
1.) "Failed to create commandQueue for device". This was also pointed out by some users and I followed the fix listed here
Bug #11702: Cannot use opencl on Cape Verde: [opencl_init] could not create command queue for device...
2.) Now once I enabled those exports the code behaves erratically. If I run the code for the first time, I get the following output (garbage)
-1.70674e+38 -1.70674e+38 2.34731e-38 2.34731e-38 3.63613e+23 3.63613e+23 -1.18942e-23 -3.04413e-21 -1.17842e+08 -2.31435e-32 7.57767e-16 ...... (omitted the rest of the output)
3.) If I run the code immediately, I get the correct output as:
0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 99 102 105 108....
4.) If I pause and then run the code again I get garbage.
So my question is what is happening here? Is there a problem with the OpenCL driver and can we have a fix? Can anyone from AMD comment on this problem?