cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

chevydevil
Adept II

Multiple contexts parallel allocating or writing to memory of a single device

Hello, I have a program which uses openmp to schedule work in parallel to one opencl device i.e a gpu. This is done right now by using multiple contexts and which have there own unique queues and buffers. The program stops after some iteration steps. I mean it just stops, without exiting or segmentation fault or something. Could it be that the allocation from multiple contexts is not thread safe? Do I have to use one context and a queue for each thread (which is my choice for the future anyway)? Btw. this only happens on a GPU device. CPU devices work fine.

Thx in advance.

0 Likes
25 Replies
himanshu_gautam
Grandmaster

Re: Multiple contexts parallel allocating or writing to memory of a single device

Multiple threads operating on a context is supported from OpenCL 1.1. All OpenCL calls  are thread-safe except "clSetKernelArg". Even with this API, multiple threads can still work with unique cl_kernel objects. However, they cannot wok with the same cl_kernel object at the same time. So, per-thread allocation of "cl_kernel" object will help overcome this issue.

Check Appendix A.2 of OpenCL Spec. So, as long as your platform is OpenCL 1.1 or later, you can use just 1 context and allow all your openmp threads to work.

However, if multiple threads are reading/writing shared "cl_mem" objects across multiple command queues -- then this can result in undefined behaviour. Check Appendix A.1 of the OpenCL Spec. That will help resolve all your doubts.

Now coming to the issue you are facing,

I am not sure what you mean the program stops...but no seg-fault. You may want to first find out until which point the application is running. (or) Please post your sources as a standalone zip file which we can use to reproduce here.

You need to also specify the following:

1. Platform - win32 / win64 / lin32 / lin64 or some other?

    Win7 or win vista or Win8.. Similarly for linux, your distribution

2. Version of driver

3. CPU or GPU Target?

4. CPU/GPU details of your hardware


THanks,

0 Likes
chevydevil
Adept II

Re: Multiple contexts parallel allocating or writing to memory of a single device

The application is a Vortex-Particle-Flow simulation with immersed boundarys. I cannot post the sources here. I'm working with Ubuntu 12.10 x64 with AMD APP 2.8 and the latest catalyst beta driver 13.2. The GPU is a HD 7970 and for this target I'm facing these problems. The simulation iterates over many time steps and advances the flow. Since the simulation is 2D based we run 2D slices in parallel. Right now every slice has a unique cl_context, buffers and kernels, but utilizises the same device. Now on the gpu the program literally just stops. It doesn't exit and the memory is still allocated but is simply doesn't do anything. It only happens for the gpu and the debugger doesn't react when I want to break. When I use the CPU device it runs without problems. I will try now an older driver version and get back to here.

0 Likes
himanshu_gautam
Grandmaster

Re: Multiple contexts parallel allocating or writing to memory of a single device

Yes, You may want to try 13.1 stable. Please let us know if that solves it.

0 Likes
chevydevil
Adept II

Re: Multiple contexts parallel allocating or writing to memory of a single device

I tried the 13.1 now and no difference. Now here comes the kicker: When I tried to debug the application with CodeXL, there is no stopping. It runs as it should. Any idea how this can be?

0 Likes
himanshu_gautam
Grandmaster

Re: Multiple contexts parallel allocating or writing to memory of a single device

CodeXL enables profiling. When profiling is enabled,  there are no asynchronous operations (like async DMA Xfer). Possibly this is affecting (just a guess)

Also, Just try creating the command queue with "CL_PROFILING_ENABLE" and see if it can work correctly on an independent run (without CodeXL)

In any case, this looks like a bug to me.

If i am not asking for too much, Can you try the earlier driver (12.10) and see if it works.

Then, we can isolate this to a driver problem.

Also, a repro case is going to really help us solve this problem. A quick small repro case would be very useful. Thanks,

0 Likes
chevydevil
Adept II

Re: Multiple contexts parallel allocating or writing to memory of a single device

I tried 12.10 and nothing. In tried my code on a NVIDIA device with AMD APP and it works. The code is quiet complex and I cannot reproduce the error in a minimal example yet.

Edit: Btw. the command_queue creation with CL_QUEUE_PROFILING_ENABLE didn't help either.

0 Likes
himanshu_gautam
Grandmaster

Re: Multiple contexts parallel allocating or writing to memory of a single device

chevydevil wrote:

I tried 12.10 and nothing.

I infer that the problem exists with 12.10 as well. Please correct me if I am wrong here

0 Likes
chevydevil
Adept II

Re: Multiple contexts parallel allocating or writing to memory of a single device

I meant the problem also exist with the 12.10 driver. I still can't reproduce the error in a simple example.

0 Likes
chevydevil
Adept II

Re: Multiple contexts parallel allocating or writing to memory of a single device

Update: Still no simple example which could reproduce the problem. But I tried to run my code via ssh. When I am already logged in to the system (having the desktop opened) and then run the code remotely it stops again. But if I'm not logged in locally and then start the code remotely it doesn't stop and I can log in locally afterwards and it doesn't stop any more. This is not perfect but since I want to set up a GPU workstation it is a workaround because then I don't need local access.    

0 Likes