I would like to print a progress bar for my OpenCL code during the kernel execution. My CUDA equivalent of this code was able to achieve this using pinned memory, I was trying to implement the same using CL_MEM_ALLOC_HOST_PTR and clEnqueueMapBuffer, but the result is quite strange.
here is a snipet of the relevant code
void host_function(){
cl_uint *progress=NULL;
cl_mem *gprogress;
gprogress=(cl_mem *)malloc(1*sizeof(cl_mem));
// define a host_ptr buffer, alloc in the pinned memory
OCL_ASSERT(((gprogress[0]=clCreateBuffer(mcxcontext,(CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR), sizeof(cl_uint),NULL,&status),status)));
// initialize the pinned memory buffer
progress = (cl_uint *)clEnqueueMapBuffer(mcxqueue[0], gprogress[0], CL_TRUE, CL_MAP_WRITE, 0, sizeof(cl_uint), 0, NULL, NULL, NULL);
*progress=0;
clEnqueueUnmapMemObject(mcxqueue[0], gprogress[0], progress, 0, NULL, NULL);
OCL_ASSERT((clSetKernelArg(mcxkernel,10, sizeof(cl_mem), (void*)(gprogress))));
// launch kernel
OCL_ASSERT((clEnqueueNDRangeKernel(mcxqueue[devid],mcxkernel[devid],1,NULL,&gpu[devid].autothread,&gpu[devid].autoblock, 0, NULL, NULL)));
if((param.debuglevel & MCX_DEBUG_PROGRESS)){
// after launching the kernel, check progress by reading gprogress[0]
progress = (cl_uint *)clEnqueueMapBuffer(mcxqueue[0], gprogress[0], CL_FALSE, CL_MAP_READ, 0, sizeof(cl_uint), 0, NULL, NULL, NULL);
do{
ndone = *progress;
MCX_FPRINTF(cfg->flog,"progress=%d\n",ndone);
}while (ndone < maxcount);
clEnqueueUnmapMemObject(mcxqueue[0], gprogress[0], progress, 0, NULL, NULL);
}
OCL_ASSERT((clFinish(mcxqueue[devid])));
}
inside the kernel, I incremented gprogress[0]. I was hoping that do/while loop could read out the updated value to progress, and print out during kernel execution.
However, what I see is that it keeps printing progress=0 at the begining, sometimes after 10 seconds ish, it prints a big jump in progress value, but stay the same for another 10 sec or more. Sometimes it just keep on printing without exiting the while loop (because it never reaches the expected maxcount).
can someone tell me if this is the correct way to implement a progress bar in OpenCL? how can I make it work?
thanks
CL_MEM _ALLOC_HOST_PTR and CL_MEM_COPY_HOST_PTR must be used simultaneously. And CL_MEM _ALLOC_HOST_PTR may cause some strange questions.
thanks for you reply. I tried using (CL_MEM_COPY_HOST_PTR | CL_MEM_ALLOC_HOST_PTR) in the clCreateBuffer call, but it failed to run with the below error message
MCX ERROR(37):Invalid host pointer in unit mcx_host.cpp:428
adding an additional CL_MEM_READ_WRITE still gives the same error.
- rogress = (cl_uint *)clEnqueueMapBuffer(mcxqueue[0], gprogress[0], CL_FALSE, CL_MAP_READ, 0, sizeof(cl_uint), 0, NULL, NULL, NULL);
- do{
- ...
- }while (ndone < maxcount);
Usage of the above clEnqueueMapBuffer call is not valid as per the OpenCL spec . When you use non-blocking version of this call, you should use event object to check the status of the command before using the updated values. clEnqueueMapBuffer says that:
blocking_map
Indicates if the map operation is
blocking
ornon-blocking
.If
blocking_map
isCL_TRUE
,clEnqueueMapBuffer
does not return until the specified region inbuffer
is mapped into the host address space and the application can access the contents of the mapped region using the pointer returned byclEnqueueMapBuffer
.If
blocking_map
isCL_FALSE
i.e. map operation is non-blocking, the pointer to the mapped region returned byclEnqueueMapBuffer
cannot be used until the map command has completed. Theevent
argument returns an event object which can be used to query the execution status of the map command. When the map command is completed, the application can access the contents of the mapped region using the pointer returned byclEnqueueMapBuffer
.
As per the OpenCL spec, you should not expect the latest contents via clEnqueueMapBuffer while a kernel is updating the same buffer. Instead, SVM fine-grained buffer with atomics should be used for this purpose.
Regards,