cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

maxdz8
Elite

Why not a clWaitAnyEvent?

The behavior of select(...) is to wake up when at least one watched descriptor is "ready".

Pthreads take it easy with a single condition variable to pthread_cond_wait.

Windows has WaitForMultipleEvents(...) which allows to sleep pretty much on everything. It will wake up when at least one event is triggered but it is possible to require all events to be triggered.

clWaitForEvents returns CL_SUCCESS if the execution status of all events in event_list is CL_COMPLETE. Whoops. This explains a bug report I had some time ago.

I am in the process of implementing the event callback system to allow wait on a single event which will be triggered by the first event reaching CL_COMPLETE. Leaving aside I must do that with care, I wonder why there's no such thing as clWaitForAnyEvent.

0 Likes
1 Solution

That makes perfect sense to me. But what do I know.

Obviously AMD does not define the OpenCL standard or API. An engineer from here may chime in on this, but I suspect the best thing to do is to get this feedback into Khronos. I recommend you go here: Public discussions about the Khronos Dynamic Media APIs and make the same request.

Hope this helps.

View solution in original post

0 Likes
11 Replies
jason
Adept III

seconded!  I'm amazed how often functionality like that is overlooked.  This actually is something I'm coming up with right now trying to keep the GPU fed in a single loop.  These functions are key to good integration and maximum utilization on larger more complex problems!

For bonus points, allow integration of clWait objects with WaitForMultipleEvents and select/poll so those functions can serve their purpose of grand-unification.

0 Likes

That would be even more than what I would hope for!

0 Likes

That makes perfect sense to me. But what do I know.

Obviously AMD does not define the OpenCL standard or API. An engineer from here may chime in on this, but I suspect the best thing to do is to get this feedback into Khronos. I recommend you go here: Public discussions about the Khronos Dynamic Media APIs and make the same request.

Hope this helps.

0 Likes
Dithermaster
Adept I

You could use clSetEventCallback as a work-around.

0 Likes

Well, that's what "implementing the event callback system" is supposed to do.

0 Likes

Ah, sorry, missed that when I went to read the 2.1 specification to see if it got added there. Best luck.

0 Likes

Hm, thought I posted this on march 3rd or so but reposting:

You're right, you can achieve the desired functionality through but as far as I can tell there is a BUT - the callback is going to be executed in another thread and there's no memory management routines for the userdata (from what I could tell).  You can work around the former with other IPC constructs, the later implies some restrictions are necessary to not leak memory/resources.

0 Likes

There's actually no need for any memory management in callbacks such as those: just don't attach ownership to userdata. That is, have it point to some persistent object with the outer code having ownership. This is canon when it comes to async callbacks.

0 Likes

I call that a restriction as in it's what I actually meant by it; that and the case where you have userdata as a local but can't pass that event away from the current scope...

In the case of the global, the user of that system must now initialize it before it's ready to be used - it may also have poor performance on scalabilty (big lock style) and requires a global initialization.  Things get easier/proper if they just had a deletion function.

Of course everything here is side stepped and simpler if clEvent was pollable/selectable/waitable.

0 Likes

I'm going to reply for the good of future readers as I don't think we are in the same mindset.


I call that a restriction as in it's what I actually meant by it; that and the case where you have userdata as a local but can't pass that event away from the current scope...



Just do be clear, I don't consider this good design, so it's like saying √ is a bad function as it does not accept negative numbers. Async pointers should be persistent, this is canon. It's like saying using putty instead of wood would make fitting a square peg in a round hole easier.

Async behavior is very well indicated and pretty much obvious. If you have put on stack data to be used asynchronously... no idea how can you expect this to ever work reliably either!


In the case of the global {1}, the user of that system must now initialize it before it's ready to be used{2} - it may also have poor performance on scalabilty (big lock style){3} and requires a global initialization{4}.  Things get easier/proper if they just had a deletion function{5}.



  1. Are you implying that something not on stack must be global? No need to do that. Persistent != global.
  2. The outer code is already initializing it before use, that's exactly how you can pass your pointer with ownership semantics so you can hypothetically destroy it.
  3. In your design/application maybe. I don't see any such problem in my context.
  4. No idea what global initialization even means at this point but just to reiterate, no need to have a global for that.
  5. Seems likely this is subjective, I would consider this avoiding proper lifetime analysis and management. Besides, nothing prevents userdata to point to something with a dtor.

Of course everything here is side stepped and simpler if clEvent was {1}pollable/{2}selectable/{3}waitable.



  1. It seems to me clEvent is pollable using clGetEventInfo(event, CL_EVENT_COMMAND_EXECUTION_STATUS, ...);
  2. To be honest, while I pointed out select(...) in my post, I don't consider it good example;
  3. No idea what you mean there: clEvent is definitely "waitable" by some definition (at least by clWaitForEvents), that's the starting point of this discussion.

Honestly, I cannot share any of your concerns so far.

0 Likes

I think there's some confusion between us so let's hash it out.

I did miss a few case for maintaining valid userdata memory between local and global strategies - you can consider this case either outerscope or call this "user managed".  There's nothing wrong with the locals strategy when you don't intend on passing around events outside the local scope.

Most of the time async callbacks rely on reference counting or user supplied maintenance functions (which in turn do the reference counting) to solve that persistent object lifetime issue.  But there's nothing like that available with userdata here.  The question is - is there an equivalent replacement for that?  If there is not, you must add more constraints/restrictions - for instance, the user must now keep a framework object alive while passing around clWait objects who reference objects within the framework object.  That approach wouldn't integrate with multiple code bases very well.  I also want to mention I would find solutions not based on reference counting surprising given clRetain*/clRelease*

I think this addresses 1,2 in your list above - if not let's think through it some more.

For 3, user managed and local userdata is more fine grained so we can say scalability is easily addressable there; for the global approach it is bad design but maybe it won't be noticable?

4: FrameWorkInit/FrameWorkFree - the kind of functions you execute when you initialize a C library - they're setting up some globals.  Sorry I mentioned that twice btw - I was tired at the time

5. There should be something along the lines of quantitative here - users passing around clWaits no longer have to be careful about maintaining coupled objects or deal with restrictions.  Lifetime management is now opaque to them.  The whole problem btw is basically calling userdata_dtor(userdata) when a given clWait refcnt == 0.  I see no way of accomplishing that in a user friendly manner.  Do you?

Re Second list.

1) pollable as in this poll: poll(3): input/output multiplexing - Linux man page not this poll: Polling (computer science) - Wikipedia, the free encyclopedia

2) select/poll/WaitForMultiplObjects are unifying wait functions... they're the only thing you have to preserve low latency and low cpu usage way to monitor multiple file descriptors / waitable objects.  You use them say to pay attention to multiple sockets, serial ports, other kinds of special devices into a single big loop - often without blocking IO.  To give an idea of why you might want to use this with clWait objects, if it worked - imagine you have a relatively long time until your clWait is ready, but some computations depend on that - however that doesn't stop some sensors data from coming in and rather than just blocking on clWaitForEvents, you could service that sensor data?  You can solve this problem with multiple threads but typically it is simpler to just do one big loop where there's no thread related issues, there are less race conditions involved, and less code required which means less bugs/risk in this approach.  There also tends to be higher performances (up to a point where 1 thread maxes out in CPU time) in this strategy if we look at it as the C10k problem - speaking of which it's not a bad idea to take a look if you haven't: The C10K problem I'm not likening this to 10k descriptors, it's just related reading that talks about multiple inputs and tying them together.

3) by waitable I meant something compatible with select/poll/wait

0 Likes