5 Replies Latest reply on May 7, 2015 2:19 AM by dipak

    problem with API - clEnqueueSVMMap

    givenchy

      hi

       

      I am using clEnqueueSVMMap and a attribute is event , but I can't find how many event types could be used when I used the this API.

      And in opencl2.0 , how could I estimate the execution time of GPU and the transfer or SVMMap time with shared virtual memory?

       

      thanks!

        • Re: problem with API - clEnqueueSVMMap
          dipak

          Hi,

          I am using clEnqueueSVMMap and a attribute is event , but I can't find how many event types could be used when I used the this API.

           

          Your question is not clear to me. Could you please be more explicit?

          [For detail description, please refer clEnqueueSVMMap.]

           

          how could I estimate the execution time of GPU and the transfer or SVMMap time with shared virtual memory?

          Each clEnqueue<> API returns an event object that can be used to find out the status of the particular command. One can use clGetEventProfilingInfo for this purpose. However, to enable the profiling, the command queue must be created with CL_QUEUE_PROFILING_ENABLE flag set in properties argument to clCreateCommandQueueWithProperties.

          Another easier and better way of doing so is, use of profiler like AMD's CodeXL. For details, please check http://developer.amd.com/tools-and-sdks/opencl-zone/codexl/codexl-benefits-detail/

           

          Regards,

          1 of 1 people found this helpful
            • Re: problem with API - clEnqueueSVMMap
              givenchy

              Hi ,

              There are some event types we can use to profile  :

              1.CL_PROFILING_COMMAND_QUEUED

              2.CL_PROFILING_COMMAND_SUBMIT

              3.CL_PROFILING_COMMAND_START

              4.CL_PROFILING_COMMAND_END

              5.CL_PROFILING_COMMAND_COMPLETE


              <<Fisrt Question>>

              When we use API (clEnqueueNDRangeKernel) , we can use 3 and 4 to estimate to kernel execution time.

              I want to know how to choose event types to estimate the transfer time from CPU to GPU when we use API as follow:

              clEnqueueWriteBuffer - clEnqueueReadBuffer

              clEnqueueSVMMap - clEnqueueSVMUnmap


              <<Second Question>>

              On kaveri (A10-7850K) , the implementation of USE_HOST_PTR is the same as SVM ?

              I wrote a program by using USE_HOST_PTR and SVM , but the total execution time (CPU + GPU) about USE_HOST_PTR is faster than SVM.

              So what's the difference between USE_HOST_PTR and SVM on APU (Kaveri).


              Thanks!

                • Re: problem with API - clEnqueueSVMMap
                  dipak

                  <<First Question>>

                  Yes, difference of CL_PROFILING_COMMAND_START and CL_PROFILING_COMMAND_END can be used to estimate the kernel execution time. This is not limited to clEnqueueNDRangeKernel call only, it is applicable to any command submitted by clEnqueue<<>>,  including clEnqueue<Read/Write>Buffer, clEnqueue<Map/Unmap> etc.



                  <<Second Question>>

                  They are not exactly same. USE_HOST_PTR acts a pinned host memory. Whereas, SVM allows both host and devices to share the same virtual memory space, hence any SVM pointer can be directly accessed by both host and devices. Runtime takes care everything behind the scene. It also needs special support for memory consistency (for example, IOMMU support). Depending on type of the SVM, the access time may vary greatly. Due to this consistency mechanism, accessing SVM buffer may be costlier than normal pinned host memory.


                  Regards,


                    • Re: problem with API - clEnqueueSVMMap
                      givenchy

                      Use these api (use_host_ptr / alloc_host_ptr / SVM ) would reduce the transfer overhead but not totally eliminate the transfer overhead , right ?

                      So if i want to estimate the transfer overhead , I could cl_profiling _command _start / end to get the time ?

                      Although I reference the event types from the Khronos Group or other website , I still don't understand how to use other event types ,

                        • Re: problem with API - clEnqueueSVMMap
                          dipak

                          Yes, there may be some overhead of using these buffers. Such pinned host memory can be directly accessed from the kernel, but in that case, kernel may take longer time due to data transfer overhead. You can't measure this extra overhead time separately as this time is included into the overall kernel execution time reported by the event profiling information. Profiling a kernel with various buffer type may give you a better indication. However, you can measure the map/upmap time explicitly. Sometimes these pinned host buffer are useful to transfer data between host and device compare to normal host memory (created with calloc/malloc).

                           

                          Although I reference the event types from the Khronos Group or other website , I still don't understand how to use other event types

                          Its not clear to me. I hope you are not mixing the event execution status (e.g. CL_QUEUED, CL_SUBMITTED etc. ) with event command type (e.g. CL_COMMAND_NDRANGE_KERNEL, CL_COMMAND_READ_BUFFER, CL_COMMAND_WRITE_BUFFER etc.) [ref. clGetEventInfo]

                           

                          If you're looking for an example to estimate the time, then here it is:

                           

                          queue = clCreateCommandQueue(context, deviceId, CL_QUEUE_PROFILING_ENABLE, NULL); // enable the profiling

                          - - -

                          cl_event event;

                          clEnqueue<COMMAND> (queue, ..., &event); // any clEnqueue<> command

                          - - -

                          cl_ulong end, start;

                          // gather the timing information

                          clGetEventProfilingInfo(eventGlobal, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &end, 0);

                          clGetEventProfilingInfo(eventGlobal, CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &start, 0);

                          std::cout<<"Command Execution time: "<<(end-start)*1.0e-6f << "(ms)" << std::endl;

                           

                          Regards,