This is probably two questions, but I cannot figure out how to make this work.
I have a set of 500 1024X900 pixel images. I need to load each one into memory and performing 1 convolution on the image. (this is one kernal)
Then once the image is convoluted, I need to push it into a 3D buffer that should stay on the device. (this is a second kernal)
Since all these images and the 3D volume is prohibitively large in memory, it would be really helpful to be able to set the two kernals in some sort of wait condition, streaming the images to the GPU, and then have the first kernal run when it notices the image, and then the second kernal run when it notices the output of the first kernal.
Is this possible? I am using cloo for .net to load the buffer, run the first kernal, load the output into the next buffer and run the second kernal. This is slower than just doing the process on a CPU. Can someone point me to the correct way or an example of how to do such a operation? Is this something that would be better performed with openGL interop?