8 Replies Latest reply on Feb 13, 2015 10:51 AM by jpola

    clReleaseContext  performance

    jpola

      Hello,

       

      I have a simple program in OpenCL which is using the Bolt 1.3 library. The code is following:

      using ctrl = bolt::cl::control;
      ctrl bolt_control;
      bolt_control.setForceRunMode(ctrl::OpenCL);
      int N = 1024;
      bolt::cl::device_vector<int> devV(N, 0, CL_MEM_READ_WRITE, false, bolt_control);
      

       

      I have executed this code in CodeXL. To my surprise I found that the function clReleaseContext takes 96% of the execution time. (please take a look at attached picture).

      Could anyone please tell me why it takes so much time?

      I've attached my clinfo log to show you how my OpenCL system looks like. In addition GPU displays the window manager at the same time, can it be the root cause of the issue?

       

      Thank you in advance for your help.

      Kuba.

        • Re: clReleaseContext  performance
          dipak

          Could you please verify whether you're getting this issue with some normal OpenCL program (without bolt library) or not? If its related to bolt library only, then The specified item was not found. forum would be more appropriate place for this query and we could move this issue to there.

           

          Regards,

            • Re: Re: clReleaseContext  performance
              jpola

              Hello,

               

              I've created simple program using only OpenCL. The program creates and releases the context 10 times.

              Here is the source code

               

              #include <CL/cl.hpp>
              #include <string>
              #include <iostream>
              #include <vector>
              
              int main()
              {
                  std::vector<cl::Platform> platforms;
                  cl::Platform::get(&platforms);
              
                  int i = 0;
                  for(auto& p : platforms)
                  {
                      std::cout << "P[" << i << "]=" << p.getInfo<CL_PLATFORM_NAME>()
                                << std::endl;
                  }
                  cl_context_properties ctx_properties[3] = {
                              CL_CONTEXT_PLATFORM,
                              (cl_context_properties)(platforms[0])(),
                              0
                          };
                  for (int n = 0; n < 10; n++)
                  {
                      cl::Context context( CL_DEVICE_TYPE_GPU, ctx_properties);
                      std::vector<cl::Device> devices = context.getInfo<CL_CONTEXT_DEVICES>();
                      i = 0;
                      for(auto& d : devices)
                      {
                          std::cout << "D["<< i << "] = " << d.getInfo<CL_DEVICE_NAME>()
                                    << std::endl;
                      }
                      cl::CommandQueue queue = cl::CommandQueue(context, devices[0]);
                  }
              }
              

               

              This code I've analysed in CodeXL. The results regarding clReleaseContext are similar, here are top 3 Cl API Summary.

               

              ApiName% of Total Time# of CallsCumultative Time(ms)Avg Time(ms)Max Time(ms)Min Time(ms)
              clReleaseContext94.5474710852.2639085.2263993.09573 48.84719
              clCreateCommandQueue    3.760161033.894633.389468.886331.45075
              clReleaseCommandQueue1.680841015.151361.515131.853910.92258

               

               

              Don't you think that clReleaseContext takes too much?

               

              In addition I've modified my source code to take the context from the queue as I usually do in my programs:

              for (int n = 0; n < 10; n++)
                      {
                          cl::Context ctx = queue.getInfo<CL_QUEUE_CONTEXT>();
                      }
              

               

              For this kind of execution I have similar results.

               

              What do you think?

               

              Regards,

              Kuba.

                • Re: clReleaseContext  performance
                  dipak

                  Hi Kuba,

                   

                  Thanks for sharing the above sample code and your observations.

                  However, my observation was different when I ran the above code on my setup (see below). In my case (see CL api summary attached herewith), clCreateCommandQueue and clReleaseCommandQueue were two dominated apis, where as clReleaseContext was negligible.

                   

                  Setup:

                  AMD A6-3410MX APU with Radeon(tm) HD Graphics

                  Windows 7 (64bit), 4GB RAM

                  14.12 AMD Catalyst Omega (14.501.1003)

                  APP SDK 2.9-1

                  CodeXL 1.6

                   

                  Could you please share your setup details so that you're in same page?

                   

                  Regards,