Is there any way to store values in a kernel between invocations? I'm looking at a large array computation that I would like to periodically interrupt to report progress back to the caller, but would prefer not to have to send all the data and state information back and forth every time. Brook+ doesn't support static vars, if i understand correctly, but is there some other way to persist state between kernel calls? I think OpenCL supports this, so it's not a hardware issue.
you don't have to
stream is pointer to gpu memory, it means the data there will be there as long as it is not destroyed even after program exit it will be there as long as you don't write too much data there
I tried the following kernel -
kernel void testKernel(int in_stream<>, out int out_stream<>, int first)
{
int incr = 1;
if (first == 1)
{
out_stream = in_stream;
}
out_stream += incr;
incr += 1;
return;
}
My calling routine initializes an int array to (0, 1, 2, 3, ...), then does a brook::Stream::read to initialize the input stream. After the first call (with first = 1), the output stream values are brook::Stream::write to another int array and printed. The values matched the input. On the simulator, each successive call (with first = 0 and no further reads to input stream) yielded an output stream with each value incremented by 1 each time, which indicates that kernel var incr is reset each time. On the actual FireStream board, the first call returns the correct (copied) stream, but successive calls return a stream filled with 1's.
Thoughts? Suggestions for further experiments?
I'm sorry I misunderstood
You're only chance is parent kernel calling child kernel that can access adn modify main memory like in binomial option sample
You see the gpu_backwardTraverse read from mainGPU output stream, then save it to that output stream
That way you could save all state of all thread in main memory
I would like to avoid the cost of transferring all the state information back and forth across the interface. I thought that maybe swapping streams between input and output might do it - like this:
void main()
{
brook::Stream s1<int> ;
brook::Strream s2<int>
int data[1024];
// some initialization cruft omitted
testKernel(s1, s2, 1);
// write s2 into local buffer and show the values
// now reverse the streams, since s2 is presumably initialized and has
// values that survived the return
testKernal(s2, s1, 0);
// show the return data from s1
}
but this didn't work, and I would like to understand why.
drstrip, avoid that flow of yours. I've done that once, it is slow, very slow. Avoid multiple kernel call
The biggest cost is in the kernel compilation, it is a hell of cost
After analyzing the generated IL code, I found Brook+'s IL kernel compiled at runtime is over 1000 lines, more will be generated if embedding more and more subkernel, I don't know how much it will cost just for long kernel compilation
If your code maintain over a lot of iteration for one long kernel, then the cost should be reduced futher using large multiple input streams
kernel void testKernel(int in_stream<>, out int out_stream<>, int first) { int incr = 1; if (first == 1) { out_stream = in_stream; } out_stream += incr; incr += 1; return; }
My calling routine initializes an int array to (0, 1, 2, 3, ...), then does a brook::Stream::read to initialize the input stream. After the first call (with first = 1), the output stream values are brook::Stream::write to another int array and printed. The values matched the input. On the simulator, each successive call (with first = 0 and no further reads to input stream) yielded an output stream with each value incremented by 1 each time, which indicates that kernel var incr is reset each time. On the actual FireStream board, the first call returns the correct (copied) stream, but successive calls return a stream filled with 1's.
Thoughts? Suggestions for further experiments?
Brook+ doesn't support static variables. incr variable declared in your kernel is local variable for each thread and it's lifetime is only one thread.
Probably, you can use LDS as static variable, but again it allows you share data within a thread group only.
You can do this in IL with srX registers. I guess support for using these registers is not in Brook+ ?
Are LDS or srX supported in Brook? If so, pointers to info would be appreciated.
Are LDS or srX supported in Brook? If so, pointers to info would be appreciated.
LDS is supported, srX is not. You can find information about Thread data sharing in Stream computing user guide section 2.17.
drstrip, avoid that flow of yours. I've done that once, it is slow, very slow. Avoid multiple kernel call
The biggest cost is in the kernel compilation, it is a hell of cost
The kernel compilation cost is paid only once. If you are calling the same kernel again as in this case, Brook+ won't compile the kernel again.
Originally posted by: gaurav.garg
The kernel compilation cost is paid only once. If you are calling the same kernel again as in this case, Brook+ won't compile the kernel again.
Thanks for pointing that gaurav, I have a hard time explain it in english...