cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

mihzaha
Journeyman III

How to solve recursion?

Hi

I have a thread which wants to start a few other threads, and those want to start more threads and so on. The thread that started other threads can do things in parallel with the threads it started, so it doesn't need to wait for them, nor do the threads return anything, so they are perfectly independent methods.

If I build in a single kernel a list of things to do, then everything is sequential, in a single thread, and the threads limited to the ammount of memory I allocate for the list.

Do you have any ideas on how to solve this since recursion isn't allowed?

Thank you, 

Mihai

0 Likes
9 Replies
psath
Journeyman III

As far as i know there's no way to spawn additional threads from within a kernel..

If you need "recursion" up to a maximum depth you could always use statically defined __device functions (__device depth1 calls __device depth2 calls ... __device depthN).

However, I'd recommend, if possible, separating out the different "recursive depths" into their own kernel calls and only doing work (no spawning) in the kernel function. AKA clEnqueueNDRangeKernel once for depth 1, 2, ... N.

If you're not doing enough work in the smaller depths to fill your device, it may not be worth offloading and it may be cheaper to run them host-side.

If you could attach a skeleton test case demonstrating what you're trying to do, that would help immensely. EDIT: in C or C++, don't worry about trying to do OpenCL calls

0 Likes

I'm trying to do ray-tracing, and every time a ray hits something it has to break in several rays. There is enough work for the first levels since there are many pixels.

But this thread ( http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=135950&highlight_key=y&keyword1=call%20thread ) got me wondering, I don't understand how a kernel can call another kernel, so, maybe spawning is possible (write rays to memory and when done, the kernel that started the rays can start the remaining rays that were written to memory in the last step)...

0 Likes

What they're doing in that thread isn't spawning additional kernels (or threads) from within the kernel. They're just calling another function (not accessible from the host) from within their kernel function (much like one can call several "helper" functions from within another in standard C)

0 Likes

Aaa, I see

They run from the beginning more threads and part of them execute something (k1) and the rest something else(k2).

So the only solutions would be: fixed amount of reflections or for every spawn starting a new session of transmissions.

Thank you

0 Likes

I forget the identifier off the top of my head, but I think there's a way to make a "kernel" function callable from both the host via clEnqueueNDRange kernel and from the device via a regular function call. This would allow you to have "higher tier" threads drill themselves down, while still allowing you to enqueue at the lower depth(s) for the additonal threads.

Glad I could help

0 Likes

Originally posted by: mihzaha

 

So the only solutions would be: fixed amount of reflections or for every spawn starting a new session of transmissions.

 

 

You can always implement recursion by creating your own stack in a buffer. Just have to mess around a bit with atomics to keep stackpointers synchronized. Pseudocode follows... PushArguments and PopArguments must be coded with proper synchronization so they work across work items.

 

kernel recursionSimulatorKernel( global Stack* pStack )

{

   while( stackNotEmpty )

  {

     Arguments* pArgs = PopArguments(Stack);

     RecursiveWorkItem( pArgs );

  }

}

 

kernel RecursiveWorkItem( global Arguments* pArguments, global Stack* pStack )

{

    blahblah calculate

   if( rayBreaks )

  {

     PushArguments( pStack, MyRecursiveArgs0 );

     ...

     PushArguments( pStack, MyRecursiveArgsN );

  }

}

 

 

0 Likes

Have you looked at path tracing instead?  That technique works much better on a GPU.

0 Likes

No, This is not quite ray-tracing also, but gets close to it. I'll look at that method too, may it inspires me with something, thank you

Thank you for the pseudo-code, but the RecursiveWorkItem-s work in parallel or sequential, because I need them to be in parallel. That's why I said it's not quite recursion, because the function that calls continues it's job after the call and works in parallel whit it's children. Also I think there's no need for stack if the rays are independent, so all I need is to be able to call other kernels and for them to work in parallel.

But that gave me an idea: have a list of rays and each kernel does it's job, when done, it sees if there are any more rays to be done, if yes, continues from there. When a ray needs to break it just puts the children in the list, so no need for the boss list. -> But that raises another question: if I have a mirror in front of the camera, all the threads will populate the list, then when they all finish, one by one will read the list (can't read all at the same time because they need to know which ray is being deleted now) so they will all be on other instruction, making everything sequential. Is that right?

Thank you

0 Likes

Essentially what you're doing is non-waiting forks. Spawn a child that does some work while the parent does some other work. And the parent doesn't need to care about the child after it's born.

I'm just going to warn you, if you have any intentions of running this on a gpu, stay as far away from if/else, switch, and for statements in kernels unless you can guarantee that the vast majority of threads will have the same execution path (exact same branches, iterations, etc.)

I'd suggest you read up on warps/wavefronts and divergent execution in regards to GPGPU programming before committing too much work. It's actually possible to get worse performance if you allow too much divergence..

0 Likes