Can you please these questions:
1. As i understand OpenCL can be run on CPUs also, and that should mean no copies to get/set pixels (in case you only want CPU to use the image)?
2. As i understand DXT is some type of compressed image. I am guessing a lot of non-OpenCL folks must be using some well established libraries to access these images from C++ world?
Can you please describe a use-case, where if would be really helpful to implement something like this by an OpenCL vendor?
1. Yes, but I want to be able to use it for APU's IGP. The idea is simple: why to copy twice the texture( 1 in cpu mem + 1 in gpu mem ) if it can be shared in virtual memory?
2. I'm thinking from an APU's shared virtual memory perspective. The idea is to store a compressed texture only ONCE ( in shared virtual memory ). The GPU can already decompress on the fly the compressed texture's block for using them in a pixel shader, etc... The problem is the CPU cannot. There are libraries to compress the whole texture(like The Compressonator of the DirectX's texture tool) .... but what I want is just to get/set pixels ON THE FLY to save memory.
Example of use: imagine I create a program for financial analysis. Some parts will be executed in C++ CPU side and other will be accelerated in GPU-side using OpenCL. I could pack some data in a compressed texture which could be accessed in OpenCL's kernels.... but I must duplicate the data for CPU C++'s side because I have no textureCompressed.getPixel(x,y) funcion... and that's a waste of memory for an APU which is supposed to use shared virtual memory...
So you should add a small library complementary in your SDK with those textureCompressed.get/SetPixel(x,y) funcions in a way the compressed textures could be used in both OpenCL kernels and CPU C++ side.
And, yep, maybe DXT compression is not optimial for that, but I'm sure you could design other kind of block compression to be efficient for that system using the new OpenCL shared virtual memory functions and your partially resident texture system.
Even in case of APU, RAM is divided into many segmets which have different READ/WRITE speeds to/from CPU or GPU. PFA relevant ppt.
Anyways the usecase may have some meat in it. IMHO it should be a request to khronos and not to AMD.
1004_final.pdf 423.9 KB