I would like to print a progress bar for my OpenCL code during the kernel execution. My CUDA equivalent of this code was able to achieve this using pinned memory, I was trying to implement the same using CL_MEM_ALLOC_HOST_PTR and clEnqueueMapBuffer, but the result is quite strange.
here is a snipet of the relevant code
void host_function(){ cl_uint *progress=NULL; cl_mem *gprogress; gprogress=(cl_mem *)malloc(1*sizeof(cl_mem)); // define a host_ptr buffer, alloc in the pinned memory OCL_ASSERT(((gprogress[0]=clCreateBuffer(mcxcontext,(CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR), sizeof(cl_uint),NULL,&status),status))); // initialize the pinned memory buffer progress = (cl_uint *)clEnqueueMapBuffer(mcxqueue[0], gprogress[0], CL_TRUE, CL_MAP_WRITE, 0, sizeof(cl_uint), 0, NULL, NULL, NULL); *progress=0; clEnqueueUnmapMemObject(mcxqueue[0], gprogress[0], progress, 0, NULL, NULL); OCL_ASSERT((clSetKernelArg(mcxkernel[i],10, sizeof(cl_mem), (void*)(gprogress)))); // launch kernel OCL_ASSERT((clEnqueueNDRangeKernel(mcxqueue[devid],mcxkernel[devid],1,NULL,&gpu[devid].autothread,&gpu[devid].autoblock, 0, NULL, NULL))); if((param.debuglevel & MCX_DEBUG_PROGRESS)){ // after launching the kernel, check progress by reading gprogress[0] progress = (cl_uint *)clEnqueueMapBuffer(mcxqueue[0], gprogress[0], CL_FALSE, CL_MAP_READ, 0, sizeof(cl_uint), 0, NULL, NULL, NULL); do{ ndone = *progress; MCX_FPRINTF(cfg->flog,"progress=%d\n",ndone); }while (ndone < maxcount); clEnqueueUnmapMemObject(mcxqueue[0], gprogress[0], progress, 0, NULL, NULL); } OCL_ASSERT((clFinish(mcxqueue[devid]))); }
inside the kernel, I incremented gprogress[0]. I was hoping that do/while loop could read out the updated value to progress, and print out during kernel execution.
However, what I see is that it keeps printing progress=0 at the begining, sometimes after 10 seconds ish, it prints a big jump in progress value, but stay the same for another 10 sec or more. Sometimes it just keep on printing without exiting the while loop (because it never reaches the expected maxcount).
can someone tell me if this is the correct way to implement a progress bar in OpenCL? how can I make it work?
thanks
CL_MEM _ALLOC_HOST_PTR and CL_MEM_COPY_HOST_PTR must be used simultaneously. And CL_MEM _ALLOC_HOST_PTR may cause some strange questions.