OpenCL

FangQ · ‎03-19-2018

I would like to print a progress bar for my OpenCL code during the kernel execution. My CUDA equivalent of this code was able to achieve this using pinned memory, I was trying to implement the same using CL_MEM_ALLOC_HOST_PTR and clEnqueueMapBuffer, but the result is quite strange.

here is a snipet of the relevant code

void host_function(){
     cl_uint *progress=NULL;
     cl_mem *gprogress;
     gprogress=(cl_mem *)malloc(1*sizeof(cl_mem));
     
     // define a host_ptr buffer, alloc in the pinned memory
     OCL_ASSERT(((gprogress[0]=clCreateBuffer(mcxcontext,(CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR), sizeof(cl_uint),NULL,&status),status)));
     // initialize the pinned memory buffer
     progress = (cl_uint *)clEnqueueMapBuffer(mcxqueue[0], gprogress[0], CL_TRUE, CL_MAP_WRITE, 0, sizeof(cl_uint), 0, NULL, NULL, NULL);
     *progress=0;
     clEnqueueUnmapMemObject(mcxqueue[0], gprogress[0], progress, 0, NULL, NULL);
     OCL_ASSERT((clSetKernelArg(mcxkernel,10, sizeof(cl_mem), (void*)(gprogress))));
     // launch kernel
     OCL_ASSERT((clEnqueueNDRangeKernel(mcxqueue[devid],mcxkernel[devid],1,NULL,&gpu[devid].autothread,&gpu[devid].autoblock, 0, NULL, NULL)));
     if((param.debuglevel & MCX_DEBUG_PROGRESS)){
             // after launching the kernel, check progress by reading gprogress[0]
             progress = (cl_uint *)clEnqueueMapBuffer(mcxqueue[0], gprogress[0], CL_FALSE, CL_MAP_READ, 0, sizeof(cl_uint), 0, NULL, NULL, NULL);
             do{
                 ndone = *progress;
                 MCX_FPRINTF(cfg->flog,"progress=%d\n",ndone);
             }while (ndone < maxcount);
             clEnqueueUnmapMemObject(mcxqueue[0], gprogress[0], progress, 0, NULL, NULL);
     }
     
     OCL_ASSERT((clFinish(mcxqueue[devid])));
}

inside the kernel, I incremented gprogress[0]. I was hoping that do/while loop could read out the updated value to progress, and print out during kernel execution.

However, what I see is that it keeps printing progress=0 at the begining, sometimes after 10 seconds ish, it prints a big jump in progress value, but stay the same for another 10 sec or more. Sometimes it just keep on printing without exiting the while loop (because it never reaches the expected maxcount).

can someone tell me if this is the correct way to implement a progress bar in OpenCL? how can I make it work?

thanks

baifenghuang · ‎03-19-2018

CL_MEM _ALLOC_HOST_PTR and CL_MEM_COPY_HOST_PTR must be used simultaneously. And CL_MEM _ALLOC_HOST_PTR may cause some strange questions.

FangQ · ‎03-19-2018

thanks for you reply. I tried using (CL_MEM_COPY_HOST_PTR | CL_MEM_ALLOC_HOST_PTR) in the clCreateBuffer call, but it failed to run with the below error message

MCX ERROR(37):Invalid host pointer in unit mcx_host.cpp:428

adding an additional CL_MEM_READ_WRITE still gives the same error.

dipak · ‎03-20-2018

rogress = (cl_uint *)clEnqueueMapBuffer(mcxqueue[0], gprogress[0], CL_FALSE, CL_MAP_READ, 0, sizeof(cl_uint), 0, NULL, NULL, NULL);
             do{
                 ...
             }while (ndone < maxcount);

Usage of the above clEnqueueMapBuffer call is not valid as per the OpenCL spec . When you use non-blocking version of this call, you should use event object to check the status of the command before using the updated values. clEnqueueMapBuffer says that:

blocking_map
Indicates if the map operation is blocking or non-blocking.
If blocking_map is CL_TRUE, clEnqueueMapBuffer does not return until the specified region in buffer is mapped into the host address space and the application can access the contents of the mapped region using the pointer returned by clEnqueueMapBuffer.
If blocking_map is CL_FALSE i.e. map operation is non-blocking, the pointer to the mapped region returned by clEnqueueMapBuffer cannot be used until the map command has completed. The event argument returns an event object which can be used to query the execution status of the map command. When the map command is completed, the application can access the contents of the mapped region using the pointer returned by clEnqueueMapBuffer.

As per the OpenCL spec, you should not expect the latest contents via clEnqueueMapBuffer while a kernel is updating the same buffer. Instead, SVM fine-grained buffer with atomics should be used for this purpose.

Regards,

OpenCL

Pass value from device to host during kernel execution