Running simple CAL application like:
while (calCtxIsEventDone(p->ctx, e) == CAL_RESULT_PENDING);
is no problem at all.
But if we have two (or more) CAL devices and trying to run two instances at same time (of course with different contexts) the performance is very poor because calResMap() function always waits for kernel execution completes. And as calResMap doesn't use any context reference (it's global), it doesn't care which thread currently executing kernel and which trying to map memory, calResMap just blocks both threads.
Is there any way to avoid this? Any async calResMap function? Any plans to implement such function?
Atm, only solution I've got is to run only one GPU calculation thread per process and start as much processes as we have GPUs at system. This solution isn't looks cute but at least it works.
Second question is more general: is AMD/ATI just abandoning ATI Stream? This forum doesn't looks like live one -- tons of unanswered questions and nearly zero activity from AMD/ATI. There was one guy (Micah) who at least was trying to answer some questions but I haven't seen him here for a month+ already. With such level of "support" there no future for ATI Stream, imho.