7 Replies Latest reply on Jun 19, 2014 1:38 AM by pinform

    Silent __private memory size limit?

    jirmik

      Hi,

      our application launches few OpenCL kernels in a loop, each iteration waiting for the previous one to complete (clFinish). One of the kernels is quite complex and uses nearly 18 kB of private memory per work item. We had very hard time making it work on AMD platform (no significant problems with nVidia or Intel). The application ran OK for few iterations of the loop and then suddenly enqueuing of the complex kernel started returning "out of resources" error. Compilation and first enqueue calls were all OK. Finally we tried replacing the __private memory buffers with pieces of __global buffer for each work item (reducing __private usage to about 3 kB per work item) and it started working even on AMD.

       

      My question: Is there any private memory size limit? I'd like to know whether we have fixed the issue in our code (reduced private memory usage) or only fixed one of side effects of some bug which is still there.

       

      All of this was happening on Ubuntu linux (12.04) with following driver:

      [6.750882] <6>[fglrx] module loaded - fglrx 13.35.5 [Mar 12 2014] with 1 minors

       

      When we tried with Windows 7, the graphics driver always crashed.

       

      Thanks,

      Martin Jirman

        • Re: Silent __private memory size limit?
          amd_support

          Hi Martin,

              When private memory size exceeds number of registers per thread on the device, rest of the private memory is automatically assigned in global memory space. With this understanding, your code should work with 18KB private memory on AMD devices also.

             We are trying to reproduce it to find out why the code is crashing. Meanwhile if you can share your code that is crashing, it would be great help.

           

          Thanks,

          AMD Support.

          • Re: Silent __private memory size limit?
            amd_support

            Hi Martin,

            We wrote a kernel with each work item having private buffer of size 22K and the kernel iterated 1000 times. The kernel is working fine on our side.

             

            Could you share your code (or better still, a bare minimum code that captures this error, so that a quick debugging can be done) with us?

             

            Thanks

            AMD Support

              • Re: Silent __private memory size limit?
                jirmik

                Hi, we have been doing some simple math here (which we probably should have done before). The kernel was tested on R9 290X, so I assume that there can be up to 112640 work items running in parallel (44 CUs x 40 wavefronts x 64 work items per wavefront). If each of our work items uses 18kB of memory, this yields total of ~2GB of memory. We were also doing some allocations, so I assume that this may be the source of problems: first launch of kernel uses one big buffer (triggering its allocation), then second launch of the kernel triggers allocation of other big buffer, causing the trouble (we need more sets of buffers to be able to interleave transfers with computations).

                 

                Does it make sense that kernel launch may trigger big buffer allocation, so that there's not enough memory for private memory of kernel and the enqueue call returns CL_OUT_OF_RESOURCES error?

                 

                The code is closed source, so we cannot share it.

                 

                Thanks,

                Martin