cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

jpola
Journeyman III

clReleaseContext performance

Hello,

I have a simple program in OpenCL which is using the Bolt 1.3 library. The code is following:


using ctrl = bolt::cl::control;


ctrl bolt_control;


bolt_control.setForceRunMode(ctrl::OpenCL);


int N = 1024;


bolt::cl::device_vector<int> devV(N, 0, CL_MEM_READ_WRITE, false, bolt_control);


I have executed this code in CodeXL. To my surprise I found that the function clReleaseContext takes 96% of the execution time. (please take a look at attached picture).

Could anyone please tell me why it takes so much time?

I've attached my clinfo log to show you how my OpenCL system looks like. In addition GPU displays the window manager at the same time, can it be the root cause of the issue?

Thank you in advance for your help.

Kuba.

0 Likes
1 Solution

No problem. I just want to share an interesting observation (see attached files) I got after profiling it using CodeXL 1.5 and 1.6 on same Linux setup (with catalyst 14.501). The characteristic of api summary table for CodeXL 1.5 was somewhat similar to my previous observation as Windows. However, it changed for CodeXL 1.6 and indeed, clReleaseContext was the dominated one. Need to do some more test and if required, I'll check with CodeXL team.

Regards,

View solution in original post

0 Likes
8 Replies
dipak
Big Boss

Could you please verify whether you're getting this issue with some normal OpenCL program (without bolt library) or not? If its related to bolt library only, then Bolt forum would be more appropriate place for this query and we could move this issue to there.

Regards,

0 Likes

Hello,

I've created simple program using only OpenCL. The program creates and releases the context 10 times.

Here is the source code


#include <CL/cl.hpp>


#include <string>


#include <iostream>


#include <vector>



int main()


{


    std::vector<cl::Platform> platforms;


    cl::Platform::get(&platforms);



    int i = 0;


    for(auto& p : platforms)


    {


        std::cout << "P[" << i << "]=" << p.getInfo<CL_PLATFORM_NAME>()


                  << std::endl;


    }


    cl_context_properties ctx_properties[3] = {


                CL_CONTEXT_PLATFORM,


                (cl_context_properties)(platforms[0])(),


                0


            };


    for (int n = 0; n < 10; n++)


    {


        cl::Context context( CL_DEVICE_TYPE_GPU, ctx_properties);


        std::vector<cl::Device> devices = context.getInfo<CL_CONTEXT_DEVICES>();


        i = 0;


        for(auto& d : devices)


        {


            std::cout << "D["<< i << "] = " << d.getInfo<CL_DEVICE_NAME>()


                      << std::endl;


        }


        cl::CommandQueue queue = cl::CommandQueue(context, devices[0]);


    }


}


This code I've analysed in CodeXL. The results regarding clReleaseContext are similar, here are top 3 Cl API Summary.

ApiName% of Total Time# of CallsCumultative Time(ms)Avg Time(ms)Max Time(ms)Min Time(ms)
clReleaseContext94.5474710852.2639085.2263993.09573 48.84719
clCreateCommandQueue    3.760161033.894633.389468.886331.45075
clReleaseCommandQueue1.680841015.151361.515131.853910.92258

Don't you think that clReleaseContext takes too much?

In addition I've modified my source code to take the context from the queue as I usually do in my programs:


for (int n = 0; n < 10; n++)


        {


            cl::Context ctx = queue.getInfo<CL_QUEUE_CONTEXT>();


        }


For this kind of execution I have similar results.

What do you think?

Regards,

Kuba.

0 Likes

Hi Kuba,

Thanks for sharing the above sample code and your observations.

However, my observation was different when I ran the above code on my setup (see below). In my case (see CL api summary attached herewith), clCreateCommandQueue and clReleaseCommandQueue were two dominated apis, where as clReleaseContext was negligible.

Setup:

AMD A6-3410MX APU with Radeon(tm) HD Graphics

Windows 7 (64bit), 4GB RAM

14.12 AMD Catalyst Omega (14.501.1003)

APP SDK 2.9-1

CodeXL 1.6

Could you please share your setup details so that you're in same page?

Regards,

0 Likes

Hi Dipak

here is my configuration:

Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz

Linux Mint 17 Cinamon 64-bit, Linux Kernel 3.13.0-24-generic, 16GB RAM

AMD Radeon R9 270X, Catalyst 14.501.1003-141120a-178002C

APP SDK 2.9.1

CodeXL 1.6.7247.0

The OpenCL code I'm executing on GPU.

If I would have situation similar to yours, where creation of the queue takes most of the time I would not be worried, because I'm creating it only once in program. The context I usually obtain from the queue when I need it. I'll try to run the code when the X11 is turned off.

Kuba.

0 Likes

Okay, I'll try to run it on a Linux machine to see whether I get similar observation as yours. Meanwhile, if possible, please try it on a Windows machine. So, our observations will be cross verified.

Regards,

0 Likes

Unfortunately I do not have an access to Windows machine.

0 Likes

No problem. I just want to share an interesting observation (see attached files) I got after profiling it using CodeXL 1.5 and 1.6 on same Linux setup (with catalyst 14.501). The characteristic of api summary table for CodeXL 1.5 was somewhat similar to my previous observation as Windows. However, it changed for CodeXL 1.6 and indeed, clReleaseContext was the dominated one. Need to do some more test and if required, I'll check with CodeXL team.

Regards,

0 Likes

Thank you Dipak for your effort.

It would be good to know which timing is correct, but it seems that it might be an issure with the CodeXL, isn't it?

Kuba.

0 Likes