AnsweredAssumed Answered

AMDGPU OpenCL Weird Results

Question asked by skn1975 on Jul 9, 2018
Latest reply on Jul 26, 2018 by skn1975

Hello AMD OpenCL Gurus. I am facing a problem when building and running an opencl example. Here are details of my setup:

a.) I installed amdgpu-pro-install --opencl=legacy --headless

b.) I get the output from clinfo as

Output from clinfo (which w

Number of platforms:     1
  Platform Profile:     FULL_PROFILE
  Platform Version:     OpenCL 2.1 AMD-APP (2639.3)
  Platform Name:     AMD Accelerated Parallel Processing
  Platform Vendor:     Advanced Micro Devices, Inc.
  Platform Extensions:     cl_khr_icd cl_amd_event_callback cl_amd_offline_devices

 

 

  Platform Name:     AMD Accelerated Parallel Processing
Number of devices:     1
  Device Type:     CL_DEVICE_TYPE_GPU
  Vendor ID:     1002h
  Board name:     AMD Radeon (TM) R5 M340
  Device Topology:     PCI[ B#1, D#0, F#0 ]
  Max compute units:     5
  Max work items dimensions:     3
Max work items[0]:     1024
Max work items[1]:     1024
Max work items[2]:     1024
  Max work group size:     256
  Preferred vector width char:     4
  Preferred vector width short:     2
  Preferred vector width int:     1
  Preferred vector width long:     1
  Preferred vector width float:     1
  Preferred vector width double:     1
  Native vector width char:     4
  Native vector width short:     2
  Native vector width int:     1
  Native vector width long:     1
  Native vector width float:     1
  Native vector width double:     1
  Max clock frequency:     750Mhz
  Address bits:     64
  Max memory allocation:     1596905472
  Image support:     Yes
  Max number of images read arguments:     128
  Max number of images write arguments:     8
  Max image 2D width:     16384
  Max image 2D height:     16384
  Max image 3D width:     2048
  Max image 3D height:     2048
  Max image 3D depth:     2048
  Max samplers within kernel:     16
  Max size of kernel argument:     1024
  Alignment (bits) of base address:     2048

  Minimum alignment (bytes) for any datatype:     128

  Single precision floating point capability

Denorms:     No
Quiet NaNs:     Yes
Round to nearest even:     Yes
Round to zero:     Yes
Round to +ve and infinity:     Yes
IEEE754-2008 fused multiply-add:     Yes
  Cache type:     Read/Write
  Cache line size:     64
  Cache size:     16384
  Global memory size:     2146349056
  Constant buffer size:     65536
  Max number of constant args:     8
  Local memory type:     Scratchpad
  Local memory size:     32768
  Max pipe arguments:     0
  Max pipe active reservations:     0
  Max pipe packet size:     0
  Max global variable size:     0

  Max global variable preferred total size:     0

  Max read/write image args:     0
  Max on device events:     0
  Queue on device max size:     0
  Max on device queues:     0
  Queue on device preferred size:     0
  SVM capabilities:    
Coarse grain buffer:     No
Fine grain buffer:     No
Fine grain system:     No
Atomics:     No
  Preferred platform atomic alignment:     0
  Preferred global atomic alignment:     0
  Preferred local atomic alignment:     0

  Kernel Preferred work group size multiple:     64

  Error correction support:     0
  Unified memory for Host and Device:     0
  Profiling timer resolution:     1
  Device endianess:     Little
  Available:     Yes
  Compiler available:     Yes
  Execution capabilities:    
Execute OpenCL kernels:     Yes
Execute native function:     No
  Queue on Host properties:    
Out-of-Order:     No
Profiling :     Yes
  Queue on Device properties:    
Out-of-Order:     No
Profiling :     No
  Platform ID:     0x7fdcfda149f0
  Name:     Hainan
  Vendor:     Advanced Micro Devices, Inc.
  Device OpenCL C version:     OpenCL C 1.2
  Driver version:     2639.3
  Profile:     FULL_PROFILE
  Version:     OpenCL 1.2 AMD-APP (2639.3)
  Extensions:     cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event

 

c.) As per the AMD website, this card is supported.

d.) I build the helloworld example from

     https://raw.githubusercontent.com/bgaster/opencl-book-samples/master/src/Chapter_2/HelloWorld/HelloWorld.cpp

e.) So far so good. However when I try to run the example I get the following bugs:

     1.) "Failed to create commandQueue for device". This was also pointed out by some users and I followed the fix listed here

           Bug #11702: Cannot use opencl on Cape Verde: [opencl_init] could not create command queue for device 0: -6 - darktable -…

      2.) Now once I enabled those exports the code behaves erratically. If I run the code for the first time, I get the following output (garbage)

-1.70674e+38 -1.70674e+38 2.34731e-38 2.34731e-38 3.63613e+23 3.63613e+23 -1.18942e-23 -3.04413e-21 -1.17842e+08 -2.31435e-32 7.57767e-16 ...... (omitted the rest of the output)

      3.) If I run the code immediately, I get the correct output as:

0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 99 102 105 108....

      4.) If I pause and then run the code again I get garbage.

So my question is what is happening here? Is there a problem with the OpenCL driver and can we have a fix? Can anyone from AMD comment on this problem?

Outcomes