Hello. A have an A6-6310 laptop. I'm investigating ways to improve an image editor using GPGPU. Since it is a pre-HSA APU, I'm using double buffering approach which copies data between host and device (code omitted)
for (int i = 0; i < chunk_count; ++i){
cur = i % 2;
clEnqueueWriteBuffer(opencl_queue[1], d_src[cur], CL_FALSE, 0, CHUNK_SZ * 4, src + i * CHUNK_SZ * 4, 1, &fence[cur][4], &fence[cur][0]);
.... clEnqueueNDRangeKernel(opencl_queue[0], composite_kernel, 1, NULL, &SIZE_PER_PASS, &l_sz, 3, &fence[cur][0], &fence[cur][3]);
.. clEnqueueReadBuffer(opencl_queue[1], d_dst[cur], CL_FALSE, 0, CHUNK_SZ * 4, dst + i * CHUNK_SZ * 4, 1, &fence[cur][3], &fence[cur][4]);
}
The problem is, I cannot create d_src and d_dst buffers larger then 64 KB, (created with CL_MEM_READ_ONLY and CL_MEM_READ_WRITE respectfully). On buffers so small, kernel and copy initialisation overhead is about 10 times longer than execution time. I figured this is because I only have 4 gigs of memory and therefore, not much of it can be pinned. Yet driver still tries to allocate a buffer in device visible host memory, which is pretty scarce, and fails. Is there a non-hackish way to increase runtime's device visible host memory limit or make driver fallback to framebuffer memory? I assume, if I acquire an OpenGL buffer this will do the trick, but this is about everything I can think of. I'm using Windows 8.1 x64. I only need a couple of megabytes, I don't think it will destroy perfomance that bad.
Thanks in advance, Salabar.