8 Replies Latest reply on Jul 14, 2018 5:31 AM by rdemaria

    AMDGPU OpenCL Weird Results

    skn1975

      Hello AMD OpenCL Gurus. I am facing a problem when building and running an opencl example. Here are details of my setup:

      a.) I installed amdgpu-pro-install --opencl=legacy --headless

      b.) I get the output from clinfo as

      Output from clinfo (which w

      Number of platforms:     1
        Platform Profile:     FULL_PROFILE
        Platform Version:     OpenCL 2.1 AMD-APP (2639.3)
        Platform Name:     AMD Accelerated Parallel Processing
        Platform Vendor:     Advanced Micro Devices, Inc.
        Platform Extensions:     cl_khr_icd cl_amd_event_callback cl_amd_offline_devices

       

       

        Platform Name:     AMD Accelerated Parallel Processing
      Number of devices:     1
        Device Type:     CL_DEVICE_TYPE_GPU
        Vendor ID:     1002h
        Board name:     AMD Radeon (TM) R5 M340
        Device Topology:     PCI[ B#1, D#0, F#0 ]
        Max compute units:     5
        Max work items dimensions:     3
      Max work items[0]:     1024
      Max work items[1]:     1024
      Max work items[2]:     1024
        Max work group size:     256
        Preferred vector width char:     4
        Preferred vector width short:     2
        Preferred vector width int:     1
        Preferred vector width long:     1
        Preferred vector width float:     1
        Preferred vector width double:     1
        Native vector width char:     4
        Native vector width short:     2
        Native vector width int:     1
        Native vector width long:     1
        Native vector width float:     1
        Native vector width double:     1
        Max clock frequency:     750Mhz
        Address bits:     64
        Max memory allocation:     1596905472
        Image support:     Yes
        Max number of images read arguments:     128
        Max number of images write arguments:     8
        Max image 2D width:     16384
        Max image 2D height:     16384
        Max image 3D width:     2048
        Max image 3D height:     2048
        Max image 3D depth:     2048
        Max samplers within kernel:     16
        Max size of kernel argument:     1024
        Alignment (bits) of base address:     2048

        Minimum alignment (bytes) for any datatype:     128

        Single precision floating point capability

      Denorms:     No
      Quiet NaNs:     Yes
      Round to nearest even:     Yes
      Round to zero:     Yes
      Round to +ve and infinity:     Yes
      IEEE754-2008 fused multiply-add:     Yes
        Cache type:     Read/Write
        Cache line size:     64
        Cache size:     16384
        Global memory size:     2146349056
        Constant buffer size:     65536
        Max number of constant args:     8
        Local memory type:     Scratchpad
        Local memory size:     32768
        Max pipe arguments:     0
        Max pipe active reservations:     0
        Max pipe packet size:     0
        Max global variable size:     0

        Max global variable preferred total size:     0

        Max read/write image args:     0
        Max on device events:     0
        Queue on device max size:     0
        Max on device queues:     0
        Queue on device preferred size:     0
        SVM capabilities:    
      Coarse grain buffer:     No
      Fine grain buffer:     No
      Fine grain system:     No
      Atomics:     No
        Preferred platform atomic alignment:     0
        Preferred global atomic alignment:     0
        Preferred local atomic alignment:     0

        Kernel Preferred work group size multiple:     64

        Error correction support:     0
        Unified memory for Host and Device:     0
        Profiling timer resolution:     1
        Device endianess:     Little
        Available:     Yes
        Compiler available:     Yes
        Execution capabilities:    
      Execute OpenCL kernels:     Yes
      Execute native function:     No
        Queue on Host properties:    
      Out-of-Order:     No
      Profiling :     Yes
        Queue on Device properties:    
      Out-of-Order:     No
      Profiling :     No
        Platform ID:     0x7fdcfda149f0
        Name:     Hainan
        Vendor:     Advanced Micro Devices, Inc.
        Device OpenCL C version:     OpenCL C 1.2
        Driver version:     2639.3
        Profile:     FULL_PROFILE
        Version:     OpenCL 1.2 AMD-APP (2639.3)
        Extensions:     cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event

       

      c.) As per the AMD website, this card is supported.

      d.) I build the helloworld example from

           https://raw.githubusercontent.com/bgaster/opencl-book-samples/master/src/Chapter_2/HelloWorld/HelloWorld.cpp

      e.) So far so good. However when I try to run the example I get the following bugs:

           1.) "Failed to create commandQueue for device". This was also pointed out by some users and I followed the fix listed here

                 Bug #11702: Cannot use opencl on Cape Verde: [opencl_init] could not create command queue for device 0: -6 - darktable -…

            2.) Now once I enabled those exports the code behaves erratically. If I run the code for the first time, I get the following output (garbage)

      -1.70674e+38 -1.70674e+38 2.34731e-38 2.34731e-38 3.63613e+23 3.63613e+23 -1.18942e-23 -3.04413e-21 -1.17842e+08 -2.31435e-32 7.57767e-16 ...... (omitted the rest of the output)

            3.) If I run the code immediately, I get the correct output as:

      0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 99 102 105 108....

            4.) If I pause and then run the code again I get garbage.

      So my question is what is happening here? Is there a problem with the OpenCL driver and can we have a fix? Can anyone from AMD comment on this problem?