After compacting an array(putting required elements from an input array into an output array) by doing a scan operation, there might be some empty spaces left in the output(compacted) array in a contiguous form after the required elements are placed. Is there a way to free these empty spaces in the kernel code itself without going back in the host(just for the sake of deleting)...?
for eg I have an input array of 100 elements with some no.s greater than 50 and some of them less than 50 and want to store the no.s more than 50 in a different array and do further processing only on those elements in that array, and I don't know the size of this output array since I don't know how many no.s are actually greater than 50(so I declare the size of this array to be 100)... then after performing a scan I get the output array with all elements more than 50... but there might be some continuous spaces empty in the output array after the storage of these elements... then how do we delete these spaces... Is there a way of doing this in the kernel code itself...? Or do we have to come back in the Host code for this...?
How do we deal with such compacted arrays to do further processing if we can't delete the remaining spaces in the kernel code itself and also if we don't want to go back in the host code..?
How are you "putting them" in the output array? How do you know on the host side how many you've written?
What I do for this kinda of function is have a counter (the dynamic array length) and increment it when I want to push another element
kernel( global int* InputArray, global int* OutputArray, volatile global int* OutputCount)
int InIndex = get_global_id(0);
int Value = InputArray[ InIndex ];
if ( Value < 50 )
int OutIndex = atomic_inc( OutputCount );
OutputArray[OutIndex] = Value;
In my situation I could pass around the OutputCount integer (it's a cl_mem buffer so I can read it back) along with my output array so I know the output length.
Normally when I read back to the host I read the count first, then only read X elements from OutputArray, but I could just re-use it in another kernel without reading back to the host
How do I reuse the array in another kernel without copying it back to the host, I mean is this possible with a cl_mem object to directly set the output array as an argument to another kernel... without reading it in the host(i.e. without using clEnqueueReadBuffer)..?
I believe so yes, you can just pass the cl_mem's into another kernel setarg() (though remember you'll need to use both cl_mem's). Someone with more experience might say you need to do a cl_flush or cl_finish in-between the uses, but so far I haven't needed to do this. (I've used large arrays, like 1920x1080x3 when doing arbritry image conversion)
Just to clarify, both the OutputCount is a cl_mem (to a single int) AND the OutputArray is another cl_mem (an arbitrary array of int's)
Though not entirely relevent; I usually pass an int (not in a cl_mem) to my kernels; const int OutputMax, so
if ( OutIndex >= OutputMax )
to make sure I don't write past the end of the array (this does wierd stuff, but doesn't crash).
When I read back I assert/stdout if OutputCount > OutputMax, as the atomic_inc will increment it over the max when run in parallel. (This is handy where I generate more data than I expect)