Is it really impossible ?
For some time now we are told that RV6xx is missing hardware capabilities to support OpenCL. But is it really true ?
If you want full OpenCL unfortunatelly it's true. But not every program needs full OpenCL. Usually only small subset of the language is required ( like for matrix multiplication, vector addition , etc ).
So lets consider if RV6xx can support some part of OpenCL.
RV6xx doesn't have compute shaders. It means that we can't give any NDRange ( work items index space ). Our NDRange is defined by output buffer size.
1. So we have first restriction here - NDRange == dimension & size of output buffer. This is usually the case when we do classical stream computing ( convert one stream into another - as in brook ).
2. Second limitation is that we can't request any local work group size - the driver makes decision here.
3. Reads from memory must use texture unit. We can't write to read buffers ( it implies that kernel must have const pointers to read buffers ).
4. Writes to output buffer must follow pattern
out[gloval_id(0)] = value; ( for 1D )
out[global_id(1)*out_width + global_id(0)] = value; ( for 2D )
5. We can't use local memory ( missing on RV6xx ).
So on device side we have kernels which are restricted to the form:
void kernel( const global floatx* input1 [, const global floatx* inputx], global floatx* output1 [, global floatx* outputx], floatx,intx variables ( not pointers ) )
{
// can't use local_id, local_size
// only global_id, global size available
... computations and memory read buffer access here ...
output1[ global_id(1)*output_width + global_id(0)] = value; // 2D case
// up to 8 outputs
}
such a kernel can be compilled into pixel shader. Kernels not matching this pattern should give compilation error on RV6xx.
On the host side we have limitation that NDRange given to kernel invocation must be as the same as output buffer size. ( there is some technical problem of opencl buffer not having 2D size, but it can be easilly solved by small extension to cal ).
So as we see some part of OpenCL can be supported on RV6xx. Now you can decide for yourself whether this model is sufficient for your work or not.
Personally i think that AMD/ATI should implement this. It is logical extension of Brook framework ( connecting old with new ). Also it would be nice gesture from AMD/ATI towards people with older cards.