• Missing lock step behaviour of Navi GPUs?

    Hey there,   I got a code that needs to share data among threads in blocks of 4, so thread i needs to access values from threads (i & 0xFC) + 0 ... (i & 0xFC) + 3.   When writing such a code in GCN...
    lolliedieb
    last modified by lolliedieb
  • Bug in OpenCL compiler

    Finally I made a minimal reproducing example of a bug in OpenCL compilers for Thaiti in Adrenalin Win10 x64 drivers (tested on two workstations with 19.12.2, 20.1.1 and 20.5.1 drivers with -O0 and -O5). Kernel is atta...
    melirius
    last modified by melirius
  • Pull Request I made for the clBLAS

    Hello,   I have a question about the Pull Request I made for the clBLAS... I am waiting quite long for "accept"... and I wonder if someone can check it..? it is at... I would really appreciate that.   ...
    sowson
    created by sowson
  • clBuildProgram prints warnings when compiling for RDNA

    I am using Radeon Pro W5700 to run kernels produced by clfft library.   When clfft compiles its kernels, it seems that calling clBuildProgram prints unspecified warnings to the console output:   "1 warning...
    elad
    last modified by elad
  • Performance of zero-ing OpenCL buffers on device

    Hello! In my project, I'm running a chain of several kernels in a loop with millions of iterations, and I need to zero out a buffer of up to 5000 floats at the start of every iteration of this loop. I tried using clE...
    jadr
    last modified by jadr
  • parameter passing for pipes in nested loop(deviceEnqueue)

    Im trying to implementation G-DBSCN in Qcom mobile platform(845/865), when in BFS part i just refrenced the sample code in DeviceEnqueueBFS in OpenCL SDK 3.0.  At first: at Qcom mobile GPU platform(84...
    youngerliu
    last modified by youngerliu
  • Poor performance of copying data between the CPU memory and GPU memory

    Hello, I'm a researcher developing Particle-in-Cell simulations in plasma physics using OpenCL with AMD's GPUs. Particle-in-Cell is an iterative method (iterating through time), which means we've got a "for" loop in ...
    jadr
    last modified by jadr
  • OpenCL development documentation on AMD GPUs

    Is there a publicly available list of all AMD GPUs supporting OpenCL which includes: product name ('AMD Radeon RX Vega 64') internal name ('gfx900', can be obtained as CL_DEVICE_NAME) architecture ('GCN gen 5') ar...
    timchist
    last modified by timchist
  • Heterogeneous toolchain for Windows?

    Good day,   I am currently running windows with OpenCL kernels across CPU(2990WX) and AMD GPU with C++17.   As AMD stopped support for OpenCL on CPU how can I adapt my tool-chain to still leverage the CPU ...
    genestoltz
    last modified by genestoltz
  • AMD GPU OpenCL get wrong results while Nvidia correct

    Recently, I translated a CPU code into OpenCL, and it has been debugged and tested (using GTX1060). The calculating process of this code is an iteration process. The calculating results are presented in the form of re...
    huzhiyuan1994
    last modified by huzhiyuan1994
  • clGgetDeviceIDs suddenly very slow

    We are currently developing an OpenCL application on Windows 10 (Visual Studio 2017) but have noticed that the OpenCL performance has recently degraded, with the call to clGetDeviceIDs now taking around 10 second...
    andyste1
    last modified by andyste1
  • How to abort clEnqueueWaitSignalAmd?

    We're developing software that uses a PCI data acquisition card to read blocks of data (records) from an external instrument. These records are transferred to a Radeon Pro WX7100 using "DirectGma", where a kernel proc...
    andyste1
    last modified by andyste1
  • OpenCL occupancy-performance nightmare

    These days I tried to squeeze some performance from a memory-intensive OCL kernel and went for GCN assembly. Saved a few registers here, few instructions there, got a nice occupancy and thought to have a perfect kerne...
    kbala
    last modified by kbala
  • Optimize LC0 - Leela Chess Zero - for AMD GPUs

    Heyho AMD community,   we are all aware about the neural network hype on gpus, and most have noticed that Nvidia has simply the forehand with their cuDNN framework.   Personally I am convinced that AMD mak...
    smato2018
    created by smato2018
  • Radeon vii and fft

    Hello, is there by any chance a recommended  ocl package of ffts for radeon vii? clfft was coded for previous generations of cards. --
    dns.on.gpu
    last modified by dns.on.gpu
  • GPUs: pick-n-mix

    Hello.   Is it possible to use ocl with 2 of more different gpus under linux? I am interested in mixing two Rad_vii, with two 280x and even one or two 7950. --
    dns.on.gpu
    last modified by dns.on.gpu
  • What's the best or the recommended way to copy the data from scalar registers to GDS?

    Perhaps, there's something that I'm not seeing in the docs, so I apologize in advance.   I've got 16 dwords in scalar registers s16-s31. I need to copy that data from the scalar registers to GDS at the GDS base ...
    sp314
    last modified by sp314
  • Getting stuck in a loop, does local variable not visible to other workitems in a work group?

    This is my kernel code: __kernel void test(__global int *input_vector,__global atomic_int *mem_flag) {     local int d[32];     if(get_local_id(0)==0) {      &#...
    avinashkrc
    last modified by avinashkrc
  • clEnqueueAcquireD3D11ObjectsKHR blocks for a long time

    In my application, I have a processing thread that enqueues an OpenCL kernel that writes to a ID3D11Texture2D object.   Everything works fine in terms of correctness. I can successfully acquire the shared O...
    elad
    last modified by elad
  • I am trying to testout how well atomicity performs on APU. But my sample program hangs the system

    I am trying to testout how well atomicity performs on APU. But my sample program does not update the variable properly hence whole system hangs as I check for updated value at either side (cpu and gpu)  in while ...
    avinashkrc
    last modified by avinashkrc