0 Replies Latest reply on Jan 7, 2016 2:18 PM by Salabar

    Limited buffer allocation in OpenCL

    Salabar

      Hello. A have an  A6-6310 laptop. I'm investigating ways to improve an image editor using GPGPU. Since it is a pre-HSA APU, I'm using double buffering approach which copies data between host and device (code omitted)    

      for (int i = 0; i < chunk_count; ++i){

              cur = i % 2;

              clEnqueueWriteBuffer(opencl_queue[1], d_src[cur], CL_FALSE, 0, CHUNK_SZ * 4, src + i * CHUNK_SZ * 4, 1, &fence[cur][4], &fence[cur][0]);

        ....         clEnqueueNDRangeKernel(opencl_queue[0], composite_kernel, 1, NULL, &SIZE_PER_PASS, &l_sz, 3, &fence[cur][0], &fence[cur][3]);

              ..   clEnqueueReadBuffer(opencl_queue[1], d_dst[cur], CL_FALSE, 0, CHUNK_SZ * 4, dst + i * CHUNK_SZ * 4, 1, &fence[cur][3], &fence[cur][4]);

          }

      The problem is, I cannot create d_src and d_dst buffers larger then 64 KB, (created with CL_MEM_READ_ONLY and CL_MEM_READ_WRITE respectfully). On buffers so small, kernel and copy initialisation overhead is about 10 times longer than execution time. I figured this is because I only have 4 gigs of memory and therefore, not much of it can be pinned. Yet driver still tries to allocate a buffer in device visible host memory, which is pretty scarce, and fails. Is there a non-hackish way to increase runtime's device visible host memory limit or make driver fallback to framebuffer memory? I assume, if I acquire an OpenGL buffer this will do the trick, but this is about everything I can think of. I'm using Windows 8.1 x64. I only need a couple of megabytes, I don't think it will destroy perfomance that bad.

      Thanks in advance, Salabar.