I need to write a function using both gather and scatter. However, Brook+ does not support this? How the CAL indentify the thread domain?
kernel void test(float4 a[], out float4 b[])
that is a good question that I want to ask also.
The only simple example so far I can find (read and write to the global buffer) is importspeed in CAL. The IL code reads from the global buffer first, does some computation and finally writes to the global buffer. However it is kind of fake since the output is not read out and used for verification in that sample program.
Can a kernel gather from streams of differerent sizes/dimentionality? I'm having issues with the following:
kernel void stringMatch(int textStream[], int nextPosition,int hashTableStream[][], int hashIndex, out int2 resultStream<> {
int i;
int idx;
int index;
int x, y;
idx = instance().x;
index = hashTableStream[hashIndex][idx];
i = 0;
if (index < nextPosition) {
while (textStream[nextPosition+i]==textStream[index+i]) {
i = i+1;
}
}
resultStream.x = i;
resultStream.y = index;
}
Just get zeros as the output.
Did you check error on your output stream?
OK was barking up the wrong tree. Doesn't like one of my input streams which is dimensioned [60000][500]. "Dimension not supported on the underlying hardware."
Is this a limitation of the HD2400 I'm still stuck with (4870 arriving imminently) or a more general limitation?
Thanks
Maximum 2D stream dimensions supported is 8192x8192 and 1D dimensions suported is 2^26.
Either you can rearrange data to match these dimensions or you can try changing algorithm to execute data tile-by-tile on GPU (Take a look at out of core MMM in samples/CPP/apps). 4870 is also having the same limitation.
What's best practice in this situation? The gather routine will try to access elements that don't exist at the extremes of the domain.
(i) Is there a way to access the size of the stream from within the kernel and limit access with if statements to prevent accessing out of bounds?
(ii) Should the domain of the kernel be limited and the extremities handled separately?
(iii) or does the compiler deal with it so there isn't a problem?
kernel void gather(int a[], out int b<> {
int idx = instance().x;
b = a[idx-1] + a[idx] + a[idx+1];
}
Thanks
This sounds typical of GPU HW behaviour, the rule of the thumb as far as I know is that if the index exceeds the maximum (respectively minimum) limit, it is beeing clamped to its maximum (respectively minimum) value.
So you don't have to test the boundaries, but you need to keep in mind that
v[max_limit+whatever_positive] will be treated as v[max_limit]
Originally posted by: tgm@ncic.ac.cn I need to write a function using both gather and scatter. However, Brook+ does not support this? How the CAL indentify the thread domain?
kernel void test(float4 a[], out float4 b[])
Brook+ supports this. Go to CPP\tutorials\ScatterStreamKernel and change input stream to gather stream, it should work without any problem.