1 Reply Latest reply on Jul 20, 2016 2:29 PM by patrickchew1234

    OpenCL newbie questions

    patrickchew1234

      I am trying to use the openCL examples in DirectGMA SDK 1.1.  The example itself compiles fine and works. I was trying to modify the OpenCL kernel code a bit to understand it, but I am finding weird behaviours. Things that should technically work doesn't seem to work. , so I don't know whether it's my understanding of OpenCL is faulty, or whether there's specific limitation on OpenCL for GPU kernel or maybe the SDK itself (aka maybe environment or stuff).  I generally get no error message when I do most of the below, which says the kernel compiles fine. It just doesn't work if I modified it as stated below (basically no output at all).

       

      // Modified kernel code which works

          const char* pKernelSrc = "__kernel void copyImage(__global uint* pData, int2 vDim)          \n \

                                    {                                                                                         \n \

                                          uint posX = get_global_id(0);                                                       \n \

                                          uint posY = get_global_id(1);                                                       \n \

                                          uint value4 = 0;                                                                    \n \

                                                                                                                              \n \

                                          value4 = pData[posY];                                      \n \

                                          pData[(posY * (vDim.y)) + posX]  = value4;                                      \n \

                                      } "; 

       

       

      .......

             m_clProcessedBuffer = clCreateBuffer(m_clCtx, CL_MEM_READ_WRITE, m_uiBufferSize, NULL, &nStatus);

             nStatus = clSetKernelArg(m_clKernel, 1, sizeof(cl_int2), (cl_int2*)&vDim);

       

          nStatus = clEnqueueNDRangeKernel(m_clCmdQueue,       //  Cmd Queue

                                             m_clKernel,       //  kernel

                                             2,                //  Work dimention >0  but less then 3

                                             NULL,             //  Global Work offset

                                             uiGlobalWorkSize, //  global work size

                                             NULL,//uiLoaclWorkSize,  //  Local work size

                                             0,                //  Num events in wait list

                                             NULL,             //  event in wait list

                                             NULL);             // events

       

      ///////////////////

      (1)    If I modify the value4 to
            value4=rgbaIn[posY] + 0xffff ; // this works
            value4=rgbaIn[posY]  & 0xffff  ; //  this doesn't work

            if (posX & 0x100) value4=rgbaIn[posY] ; // this works

          value4 = 0x55555555; // this doesn't work 

       

       

      (2) If I modify the input parameter from __global uint * rgbIn to __global uchar* rgbIn,  it doesn't work, even if I typecast it ..
              value4 = (uint) rgbIn[posY];  // doesn't work

      (3) The code doesn't work either if I just pass in the cl_mem I created from the clCreateBuffer without initializing the buffer with something from the host. I.E. . if I create a 1920x1080 sized buffer, and only initialize 16 bytes with a pattern from host memory with enqueueWriteBuffer, it seems that the kernel can only use the 16 bytes of that buffer. basically on a totally "white image", if the kernel takes the input and adds yellow to it, I only see the same amount of yellow (16 bytes worth) on the kernel output. If I increase the size of the enqueueWriteBuffer to copy 256 bytes, the kernel output will show 256bytes of yellow. Rest of the image is pure white..  Is this expected behavior ?  It's almost feels  like "something needs to be done to the buffer to load it in cache"  before kernel can use it.

       

          Are there some very strict limitations to OpenCL kernels for AMD-GPUs ?  I can't seem to find any information on any strict limitations.  Can anybody who's has written OpenCL code on AMDs GPUs using specifically just buffers from GPU-memry, and ideally DirectGMA 1.1 SDK help shed some light on my questions above  The DirectGMA 1.1 SDK is linked to the 2.91 OpenCL SDK. I would like to know/understand whether the above is supposed to work or not. I am testing this with FirePro 7100 board with 16.15.2001 drivers on Windows 10

       

      - Patrick

       

       

       

       

        • Re: OpenCL newbie questions
          patrickchew1234

          I figured the issues. It's related to the top bit not being 0xFF in my tests ...  and as long as they are 0xFF it seems to work fine now. I was treating data as XRGB when doing my initial tests ... but I guess sample code was ARGB.

           

          Anyway, basically constant assignments was working, just that the results were "invisible" when rendered by OpenGL. Same thing with the & masks.

           

          I am I am not sure whether it's SDK related or what. I downloaded the 2.9 OpenCL SDK, and recompiled the OpenCL code with it, and now, I seem to be able to use cl_mem buffers as inputs without having to specifically initialize it with clEnqueueWriteBuffer() first .  Anyway, I got enough figured out now.