3 Replies Latest reply on Mar 20, 2018 3:40 AM by dipak

    Pass value from device to host during kernel execution

    FangQ

      I would like to print a progress bar for my OpenCL code during the kernel execution. My CUDA equivalent of this code was able to achieve this using pinned memory, I was trying to implement the same using CL_MEM_ALLOC_HOST_PTR and clEnqueueMapBuffer, but the result is quite strange.

       

      here is a snipet of the relevant code

       

      void host_function(){
      
      
           cl_uint *progress=NULL;
           cl_mem *gprogress;
      
      
           gprogress=(cl_mem *)malloc(1*sizeof(cl_mem));
           
           // define a host_ptr buffer, alloc in the pinned memory
      
      
           OCL_ASSERT(((gprogress[0]=clCreateBuffer(mcxcontext,(CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR), sizeof(cl_uint),NULL,&status),status)));
      
      
           // initialize the pinned memory buffer
           progress = (cl_uint *)clEnqueueMapBuffer(mcxqueue[0], gprogress[0], CL_TRUE, CL_MAP_WRITE, 0, sizeof(cl_uint), 0, NULL, NULL, NULL);
           *progress=0;
           clEnqueueUnmapMemObject(mcxqueue[0], gprogress[0], progress, 0, NULL, NULL);
      
      
           OCL_ASSERT((clSetKernelArg(mcxkernel[i],10, sizeof(cl_mem), (void*)(gprogress))));
      
      
           // launch kernel
           OCL_ASSERT((clEnqueueNDRangeKernel(mcxqueue[devid],mcxkernel[devid],1,NULL,&gpu[devid].autothread,&gpu[devid].autoblock, 0, NULL, NULL)));
      
      
           if((param.debuglevel & MCX_DEBUG_PROGRESS)){
                   // after launching the kernel, check progress by reading gprogress[0]
      
      
                   progress = (cl_uint *)clEnqueueMapBuffer(mcxqueue[0], gprogress[0], CL_FALSE, CL_MAP_READ, 0, sizeof(cl_uint), 0, NULL, NULL, NULL);
                   do{
                       ndone = *progress;
                       MCX_FPRINTF(cfg->flog,"progress=%d\n",ndone);
                   }while (ndone < maxcount);
                   clEnqueueUnmapMemObject(mcxqueue[0], gprogress[0], progress, 0, NULL, NULL);
           }
           
           OCL_ASSERT((clFinish(mcxqueue[devid])));
      
      
      
      
      
      
      }
      

       

      inside the kernel, I incremented gprogress[0]. I was hoping that do/while loop could read out the updated value to progress, and print out during kernel execution.

       

      However, what I see is that it keeps printing progress=0 at the begining, sometimes after 10 seconds ish, it prints a big jump in progress value, but stay the same for another 10 sec or more. Sometimes it just keep on printing without exiting the while loop (because it never reaches the expected maxcount).

       

      can someone tell me if this is the correct way to implement a progress bar in OpenCL? how can I make it work?

       

      thanks

        • Re: Pass value from device to host during kernel execution
          baifenghuang

          CL_MEM _ALLOC_HOST_PTR and CL_MEM_COPY_HOST_PTR must be used simultaneously. And CL_MEM _ALLOC_HOST_PTR may cause some strange questions.

          • Re: Pass value from device to host during kernel execution
            dipak
            1. rogress = (cl_uint *)clEnqueueMapBuffer(mcxqueue[0], gprogress[0], CL_FALSE, CL_MAP_READ, 0, sizeof(cl_uint), 0, NULL, NULL, NULL); 
            2.              do
            3.                  ...
            4.              }while (ndone < maxcount);

            Usage of the above clEnqueueMapBuffer call is not valid as per the OpenCL spec . When you use non-blocking version of this call, you should use event object to check the status of the command before using the updated values. clEnqueueMapBuffer says that:

            blocking_map

            Indicates if the map operation is blocking or non-blocking.

            If blocking_map is CL_TRUE, clEnqueueMapBuffer does not return until the specified region in buffer is mapped into the host address space and the application can access the contents of the mapped region using the pointer returned by clEnqueueMapBuffer.

            If blocking_map is CL_FALSE i.e. map operation is non-blocking, the pointer to the mapped region returned by clEnqueueMapBuffer cannot be used until the map command has completed. The event argument returns an event object which can be used to query the execution status of the map command. When the map command is completed, the application can access the contents of the mapped region using the pointer returned by clEnqueueMapBuffer.

             

            As per the OpenCL spec, you should not expect the latest contents via clEnqueueMapBuffer while a kernel is updating the same buffer. Instead, SVM fine-grained buffer with atomics should be used for this purpose.

             

             

            Regards,