Archives Discussions

afo · ‎04-01-2011

calCtxFlush VS calCtxIsEventDone, differences? aplicability?

Hi all,

I was searching for the differences and uses of calCtxFlush and calCtxIsEventDone. The closest answer was:

http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=97233&highlight_key=y&keyword1=calctxflush

But is not clear to me the difference between both and when to use each.

Other question: calCtxFlush is a blocking call? Dispatches the kernels to the GPU and returns? Dispatches the kernels to the GPU, waits until the kernels are computed and return?

Thanks a lot for any insight about this.

best regards,

Alfonso

Jawed · ‎04-01-2011

Neither is a blocking call.

Do you have

http://developer.amd.com/gpu/AMDAPPSDK/assets/AMD_CAL_Programming_Guide_v2.0.pdf

You need to make your own loop to perform the wait.

If you chain kernels such that the input to kernel 2 is the output from kernel 1, then there is no need to "wait" for kernel 1 to complete. But you will want to wait for kernel 2 to complete.

Basically when a chain of dependent kernels is set up you wait for the final kernel. Doing so makes the entire chain start executing.

As you submit each kernel (calCtxRunProgram) you can call calCtxFlush to get it going. That way, by the time you do the wait-loop, the chain will already be executing - instead of being merely queued, waiting to execute.

calCtxRunProgram(kernel1)
calCtxFlush()
calCtxRunProgram(kernel2)
calCtxFlush()
calCtxRunProgram(kernel3)
while (calCtxIsEventDone(kernel3_event) == CAL_RESULT_PENDING) {}

afo · ‎04-01-2011

Hi,

Thanks a lot for the answer.

"Do you have http://developer.amd.com/gpu/AMDAPPSDK/assets/AMD_CAL_Programming_Guide_v2.0.pdf

You need to make your own loop to perform the wait."

Yes, I have it, but is not clear as I would like, for example, there are few references to calCtxFlush and its use.

"calCtxRunProgram(kernel1)
calCtxFlush()
calCtxRunProgram(kernel2)
calCtxFlush()
calCtxRunProgram(kernel3)
while (calCtxIsEventDone(kernel3_event) == CAL_RESULT_PENDING) {}"

What would be the difference if I replace calCtxFlush like this? (i.e. makes a difference at all?):

calCtxRunProgram(kernel1)
calCtxIsEventDone(kernel1_event)
calCtxRunProgram(kernel2)
calCtxIsEventDone(kernel2_event)
calCtxRunProgram(kernel3)
while (calCtxIsEventDone(kernel3_event) == CAL_RESULT_PENDING) {}

or this:

calCtxRunProgram(kernel1)
calCtxRunProgram(kernel2)
calCtxRunProgram(kernel3)
calCtxFlush()
while (calCtxIsEventDone(kernel3_event) == CAL_RESULT_PENDING) {}

If there is no difference, what is the difference/usage for calCtxFlush() vs calCtxIsEventDone?

Doing a calCtxRunProgramGridArray with the 3 kernels makes a performance difference?

I use compute shaders, and as in the samples I execute the kernels with calCtxRunProgramGrid instead of calCtxRunProgram; does it make a performance difference?

Again, thanks for the answer and your help.

best regards,

Alfonso

Jawed · ‎04-01-2011

If you use calCtxIsEventDone(kernel1_event) after the first kernel (similar for the second kernel), but don't put a loop there, it's effectively the same as calCtxFlush() (well, as long as no other host thread is submitting work to that context).

If you wait to do calCtxFlush after all three kernels are executed you're merely delaying the start time of the execution of kernel1. The delay is going to be tiny. Unless of course you're doing complicated work between those calCtxRunProgram calls. Experiment.

calCtxFlush doesn't care about the events it just requests that the driver gets the entire queue going.

calCtxIsEventDone should be used when you are trying to manage sequences of events on a per kernel basis (same for copies).

I never used calCtxRunProgramGridArray so I can't comment really. Basically the kernels all execute under a single event - rather than one event per kernel. In this case it looks like merely having a while loop for calCtxIsEventDone (grid_event) immediately after calCtxRunProgramGridArray will do the trick.

It's around a year since I did this stuff so I'm a bit rusty. Also the usage model I had was pretty unusual so it's a bit of a distraction to go into all the motivations.

Overall, it's best to experiment.

afo · ‎04-01-2011

Hi,

Thanks, I will play with these variants while waiting for SDK 2.4, which suposedly fixes multiGPU support.

regards,

Alfonso

Archives Discussions

CAL question.