After playing with Brook and CAL on my new 4850 for a few days, I have a general question that I see has been mentioned before here. I am trying to understand the possibilities/limitations/impossibilities involved.
Can a memory buffer be created (at a specified address) by a CPU, and then shared concurrently with CPU(s) and an 'infinitely-running' kernel on GPU(s)? I think this must be possible, but will require replacing parts of the CAL runtime with new routines(?) It also seems like kernels will have to be written at the ISA level to get around some undesirable effects of IL optimization like discarding fences (which presumably could be used to force stores to occur within a loop, rather than just once at the end).
I don't think there are any hardware limitations preventing the above scenario. Is the main concern ensuring proper memory management (to guarantee all processes are accessing the same physical memory locations)? I am wondering more about the CPU vs GPU sharing question. I can see there might be hardware issues with GPU thread vs. thread sharing if any noncoherent caching is done between input and output and the caching can't be explicity controlled (flushed).
Thanks for any info...