Originally posted by: emuller I assume you are using threads. Have you tried MPI processes? I'm using python multiprocessing ... perhaps if the calMemCopy call gets its own context and its own process? |
Yeah, I'm using multi-threaded win32 application, creating separate thread per each GPU (and separate context too ofc). It's OK to run several processes, they have no problems to asynchronously works with different GPUs... which makes me wonder, why ATI failed to implement it within single process.
For me workaround was to use pinned memory, as my algorithms doesn't requires massive memory transfers -- it solves everything.
Anyway, for calMemCopy we're need to be within single process with kernel invocation routine itself, otherwise we cannot access memory transferred in another GPU context at all, and what's the point to transfer it "nowhere"? ...Or I'm missed something