AnsweredAssumed Answered

OpenCL newbie questions

Question asked by patrickchew1234 on Jul 20, 2016
Latest reply on Jul 20, 2016 by patrickchew1234

I am trying to use the openCL examples in DirectGMA SDK 1.1.  The example itself compiles fine and works. I was trying to modify the OpenCL kernel code a bit to understand it, but I am finding weird behaviours. Things that should technically work doesn't seem to work. , so I don't know whether it's my understanding of OpenCL is faulty, or whether there's specific limitation on OpenCL for GPU kernel or maybe the SDK itself (aka maybe environment or stuff).  I generally get no error message when I do most of the below, which says the kernel compiles fine. It just doesn't work if I modified it as stated below (basically no output at all).


// Modified kernel code which works

    const char* pKernelSrc = "__kernel void copyImage(__global uint* pData, int2 vDim)          \n \

                              {                                                                                         \n \

                                    uint posX = get_global_id(0);                                                       \n \

                                    uint posY = get_global_id(1);                                                       \n \

                                    uint value4 = 0;                                                                    \n \

                                                                                                                        \n \

                                    value4 = pData[posY];                                      \n \

                                    pData[(posY * (vDim.y)) + posX]  = value4;                                      \n \

                                } "; 




       m_clProcessedBuffer = clCreateBuffer(m_clCtx, CL_MEM_READ_WRITE, m_uiBufferSize, NULL, &nStatus);

       nStatus = clSetKernelArg(m_clKernel, 1, sizeof(cl_int2), (cl_int2*)&vDim);


    nStatus = clEnqueueNDRangeKernel(m_clCmdQueue,       //  Cmd Queue

                                       m_clKernel,       //  kernel

                                       2,                //  Work dimention >0  but less then 3

                                       NULL,             //  Global Work offset

                                       uiGlobalWorkSize, //  global work size

                                       NULL,//uiLoaclWorkSize,  //  Local work size

                                       0,                //  Num events in wait list

                                       NULL,             //  event in wait list

                                       NULL);             // events



(1)    If I modify the value4 to
      value4=rgbaIn[posY] + 0xffff ; // this works
      value4=rgbaIn[posY]  & 0xffff  ; //  this doesn't work

      if (posX & 0x100) value4=rgbaIn[posY] ; // this works

    value4 = 0x55555555; // this doesn't work 



(2) If I modify the input parameter from __global uint * rgbIn to __global uchar* rgbIn,  it doesn't work, even if I typecast it ..
        value4 = (uint) rgbIn[posY];  // doesn't work

(3) The code doesn't work either if I just pass in the cl_mem I created from the clCreateBuffer without initializing the buffer with something from the host. I.E. . if I create a 1920x1080 sized buffer, and only initialize 16 bytes with a pattern from host memory with enqueueWriteBuffer, it seems that the kernel can only use the 16 bytes of that buffer. basically on a totally "white image", if the kernel takes the input and adds yellow to it, I only see the same amount of yellow (16 bytes worth) on the kernel output. If I increase the size of the enqueueWriteBuffer to copy 256 bytes, the kernel output will show 256bytes of yellow. Rest of the image is pure white..  Is this expected behavior ?  It's almost feels  like "something needs to be done to the buffer to load it in cache"  before kernel can use it.


    Are there some very strict limitations to OpenCL kernels for AMD-GPUs ?  I can't seem to find any information on any strict limitations.  Can anybody who's has written OpenCL code on AMDs GPUs using specifically just buffers from GPU-memry, and ideally DirectGMA 1.1 SDK help shed some light on my questions above  The DirectGMA 1.1 SDK is linked to the 2.91 OpenCL SDK. I would like to know/understand whether the above is supposed to work or not. I am testing this with FirePro 7100 board with 16.15.2001 drivers on Windows 10


- Patrick