I was wondering if its possible to get a more generic calMemCopy operation :
Consider a resource 2d resource A and a 2d resource B. It is currently
not possible to copy pieces of A into B. It would be much better if an
extended version of calMemCopy was provided that can copy a
rectangular block from an arbitrary starting position inside A to an
arbitrary position inside B. Moreover it would be nice if it didnt
actually look at the formats and just copied the data over as is as long as the sizes
provided are appropriate.
Currently, the only way to copy data (without involving CPU) is to
copy entire resource A to B which can only be done if size and format
of A and B match. The other option is to map, involve the CPU, then
unmap but thats undesirable on several levels.
This is very problematic in many cases. For example, take the global buffer. If I want to use the global buffer to store 2 matrices, and if I want to copy only 1 of those matrices to the CPU at some point of time, then thats not possible.
As another motivation, CUDA does offer a very flexible memcpy
I think a more flexible calMemCopy is very very desirable.