• S_WAKEUP instruction

    The Vega Shader ISA doc (https://developer.amd.com/wp-content/resources/Vega_Shader_ISA_28July2017.pdf) describes S_WAKEUP instruction as follows (I quote) -   Allow a wave to 'ping' all the other waves in its t...
    sp314
    last modified by sp314
  • Processing two buffers using an out of order queue

    I have a PCI data acquisition card that supports P2P. It will be capturing records one after the other at a very rapid rate, and the plan is to write each record to the GPU using DirectGMA, where a kernel will process...
    andyste1
    last modified by andyste1
  • The values returned by clGetDeviceInfo() and clGetPlatformInfo() seem to be just a little off. Why?

    I've got Ubuntu Linux 16.04 with ROCm and AMDGPU-PRO drivers, and an R290x card, which is the only GPU I have on this computer. When I query the device name with clGetDeviceInfo(...CL_DEVICE_NAME...), for some reason,...
    sp314
    last modified by sp314
  • Why my VGPRs Usage increases so fast when I use this assignment statement code in OpenCL?

    if (condition) {*foundFlag = 1; dst[gid] = gid * crack_cnt + num; break; } This code is used in ending kernel funtion when password is found(2 AMD 7970 devices and OpenCL platform). *foundFlag is a pointer to a char v...
    yanmin950122
    last modified by yanmin950122
  • Optimizing data transfer with APU (best way to test zero-copy?)

    So finally I have got my APU test system (I paid for it!): -CPU: AMD Ryzen 5 2400G -MB: Asrock X470 Fatality Gaming mini-ITX -RAM: G.Skill 3200 C14, 16GB*2 -OS: Windows 10 Pro -IDE and compiler: Visual Studio 2017 Com...
    sandbo
    last modified by sandbo
  • OpenCL memory transfer / zero copy buffers on embedded GPUs

    Hi,   I am trying to understand the mechanics of OpenCL memory access and transfers (in particular on AMD Ryzen V1000 embedded systems coming with Zen cores and an embedded Vega GPU), with the motivation of want...
    exilef
    last modified by exilef
  • OpenCL amdgpu-pro generated code performance - please convert 'select' to cndmask

    Hi,   I don't know if this place is the best place to report opencl compiler performance issues, but well I didn't find a better place.   SUMMARY: Please AMD devs, when an OpenCL dev takes the time to expl...
    mannerov
    last modified by mannerov
  • CL-GL Interop fastest way to synchronize?

    We are using OpenCL on Windows as part of a proprietary game-engine where we use the CL-GL interop functionality to communicate between the simulation and the rendering engine. Our core loop currently executes the fol...
    george72
    last modified by george72
  • Is unit2 operations faster than ulong in OpenCL on AMD GCN cards?

    Which of the "+" calculation is faster? 1) uint2 a, b, c; c = a + b; 2) ulong a, b, c; c = a + b; Specifically on RX 580 or Vega cards.
    fancyix
    last modified by fancyix
  • Any instruction level or line-by-line profiler?

    It will be very helper if we can analyze the cost of each instruction or each OpenCL line. Either ROCm or AMDGPU driver is fine. Thanks in advance.
    fancyix
    last modified by fancyix
  • How can I know which gpu is used by the OS?

    If I have multi GPUs in my system, how can I know which gpu is used by the OS. I am using OpenCL to do computing, and I don't want do use this gpu to do gpgpu. thanks in advance.
    tdchen
    last modified by tdchen
  • A problem to solve with OpenCL and DirectGMA...

    I've been tasked with solving a problem that feels like it might be a good fit for a GPU, although I could be wrong...   We have a data acquisition card that generates nearly 8Gb/sec, typically in the form of a ...
    andyste1
    last modified by andyste1
  • Store array in regs?

    If I made an array like uint[128], the driver will spill it even if there is enough registers to store this array. Any way I can do to let compiler store big array in registers? Maybe some compile option?
    fancyix
    last modified by fancyix
  • How to use pinned memory for reading from GPU?

    I'm struggling to find examples of using pinned memory, especially when it comes to reading data from the GPU. Assuming my kernel has a 'int*' argument (containing the "results" to be read back by the host), would th...
    andyste1
    last modified by andyste1
  • List of neural network/machine learning/GPU computing apps that support OpenCL acceleration on AMD Fx HW?

    Hi, I have a few questions. I hope you can help me.   I am trying to learn neural nets/ML on my older, Fx based hardware.   I very much prefer the openCL development model. As discussed elsewhere, people ...
    devlista
    last modified by devlista
  • Why do different values of local_size affect my kernel?

    I'm new to OpenCL and am currently exploring some potential uses of it and GPU processing. So far I have written the following kernel:- __kernel void test(__global const read_only int* region_table, __global const re...
    andyste1
    last modified by andyste1
  • Understanding buffer creation

    Hi, I'm new to OpenCL and am trying to get a better understanding of how clCreateBuffer works. Does this command simply "reserve" an area of memory on the device, or does an implicit "copy" occur, e.g. to initialise t...
    andyste1
    last modified by andyste1
  • Any way to avoid using too many VGPRs?

    Is there anything like cuda's "register" keyword hinting the compiler to store the value of one variable in one register, instead of using many registers for storing its temporary value? I tried "volatile" but sometim...
    fancyix
    last modified by fancyix
  • Memory bandwidth anomaly

    Hi, Recently I did some tests about GDDR5 memory bandwidth. In a few words: It launches numberOfCUes*4 waveFronts. Each WF is reading 1024 bytes from a random aligned location in a large buffer. The whole kernel ...
    realhet
    last modified by realhet
  • Parallel execution of OpenCL kernels of two different programs on a GPU

    I have two OpenCL applications that I want to execute in parallel on a GPU. The purpose is to reduce the execution time of two parallelly executing programs in comparison to when both programs are executed one by one....
    yasirnoman
    last modified by yasirnoman