• How to read GDS using PM4 packets in Vega 10?

    Hi, I recently do some experiments with the global data share(GDS). I wrote a kernel that simply writes some contents into GDS starting from address 0 using ds_write instructions. The question is how to read the cont...
    harry1
    last modified by harry1
  • Wrong OpenCL calculation result on AMD 5700 XT

    Good day!   Our company uses OpenCL framework to work with AMD GPUs. But unfortunately, the OpenCL driver for AMD 5700 XT GPU gives wrong calculation results. This applies for all GPU drivers I have tested so fa...
    Neverhood
    last modified by Neverhood
  • Kernel crashes if loop cycles too high in too many items

    My items run variable number of loop cycles, from 1 to X, and when X is relatively low, on the ballpark of 16k, the entire work completes successfully. When X grows to 65k, and the percentage of items that reach this...
    infovel
    last modified by infovel
  • How to enable cl_ext_atomic_counters_32 on Vega 56

    Hi, I wanted to use cl_ext_atomic_counters_32 on Vega 56. I found that it needs the legacy driver. Is There a Way to Access Global Data Share (GDS) on Ellesmere (RX 480)?  I have tried on Ubuntu 18.04 using...
    harry1
    last modified by harry1
  • Radeon HD5850 vs 7770, 7850, 6530(APU), or 6450

    A while ago, I developed a kernel that runs fine on the 5850 card. When I had an opportunity to test it on the newer 7770 and 7850 cards, it did not work. The result was wrong, it messed up the mandelbrot set, even t...
    infovel
    last modified by infovel
  • Can scalar and vector operations run in parallel in GCN?

    Well the question already tells most of what I would like to know. Since vector and scalar ALU are different pieces of hardware I wondered if the two alus can be active in parallel when the instructions running have n...
    lolliedieb
    last modified by lolliedieb
  • How to force placement of memory buffer in Windows?

    Hello,   2nd question today, but more or less unrelated to the first one. I have an application, that needs to use almost the complete memory that the GPU offers (up to 5 MByte). The driver reports (via CL_DEVIC...
    lolliedieb
    last modified by lolliedieb
  • OpenCL compiler bug

    I've been working on adding OpenCL support to our code generator (GitHub - genn-team/genn: GeNN is a GPU-enhanced Neuronal Network simulation environment based on code generation for Nvi… ) and the ge...
    neworderofjamie@gmail.com
    last modified by neworderofjamie@gmail.com
  • OpenCL runtime bug when sharing texture with D3D11

    I have a program that initializes 2 Direct3D11 textures. It then extract 2 cl_mem image objects from each texture and download its content to RAM.   When using DirectX11 device created via the new D3D11On12 API ...
    elad
    last modified by elad
  • OpenCL 2.0 compiler bug? (device side enqueue)

    A similar issue is reported here.   I compile a kernel (kernel1) that performs device-side enqueue to another kernel (kernel2). When kernel2 is empty, or contains little code, there is no problem.    ...
    elad
    last modified by elad
  • Looking for OpenCL Linux driver for Threadripper 3990X

    My Threadripper 3990X based servers (10-nodes) have arrived and I am looking for the OpenCL driver to run some benchmarks. The servers run Ubuntu 20.04 with NVIDIA GPU drivers. I googled and could not find a...
    FangQ
    last modified by FangQ
  • Newcomer - Can I Get Whitelisted for OpenCL Forum?

    Hello AMD!   I'm having a problem where my new Radeon VII is not being detected by clinfo for OpenCL/compute jobs, while my RX 580 still is.   A helpful user replied and let me know I should probably ask t...
    makeitwork
    last modified by makeitwork
  • Replacement of VOP2 versions of v_add / v_addc instructions on Navi

    Hello,   currently I am trying to improve my kernels by inserting assembly code - for Vega GPUs by using clrxasm or testing them on rocm with inline assembly and for Navi I am testing my codes by just inline asm...
    lolliedieb
    last modified by lolliedieb
  • AMD's OCL global work size with 2d work dimensions limit

    I have found a problem when executing a kernel the second dimesion of the work units get_global_id(1) get limited to around 120  whille the first dimension can execute every unit until the max set on the first di...
    pontiacgtx
    last modified by pontiacgtx
  • Bug in OpenCL compiler?

    I've been working on adding OpenCL support to our code generator (GitHub - genn-team/genn: GeNN is a GPU-enhanced Neuronal Network simulation environment based on code generation for Nvi… ) and the ge...
    neworderofjamie@gmail.com
    last modified by neworderofjamie@gmail.com
  • OpenCL on E8860 Linux

    I am having issues with running an OpenCL program on an E8860 and would like to ask for advice.   I am trying to get an OpenCL program to run on an E8860 on Linux, preferrably Centos 7  I have tried th...
    rt0218
    last modified by rt0218
  • Is there an elegant way to force recalculation (of values or addresses)

    Well the question in the title already hits it. I got a rather simple kernel, which uses 20 vgpr and the complete 32 kByte of shared memory. So all fine for running 2x 1024 threads per work group. So fine so far. Bu...
    lolliedieb
    last modified by lolliedieb
  • Missing lock step behaviour of Navi GPUs?

    Hey there,   I got a code that needs to share data among threads in blocks of 4, so thread i needs to access values from threads (i & 0xFC) + 0 ... (i & 0xFC) + 3.   When writing such a code in GCN...
    lolliedieb
    last modified by lolliedieb
  • Is there a way to combine OpenCL engine from older driver pack with newer one?

    As I found out, newer OpenCL compiler in Adrenalin drivers for Win10 x64 have a bug in realization that prevents my code to work correctly on Tahiti cards. Then I determined that old driver pack of 16.4.2 was without ...
    melirius
    last modified by melirius
  • OpenCL 2.0 Device command queue keeps filling up and halting execution

    I am utilizing OpenCL’s enqueue_kernel() function to enqueue kernels dynamically from the GPU to reduce unnecessary host interactions. Here is a simplified example of what I am trying to do in the kernels: kerne...
    pmorgan4801
    last modified by pmorgan4801