cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

drstrip
Journeyman III

Persistent Kernel Memory?

Is there any way to store values in a kernel between invocations? I'm looking at a large array computation that I would like to periodically interrupt to report progress back to the caller, but would prefer not to have to send all the data and state information back and forth every time. Brook+ doesn't support static vars, if i understand correctly, but is there some other way to persist state between kernel calls? I think OpenCL supports this, so it's not a hardware issue.

0 Likes
10 Replies
riza_guntur
Journeyman III

you don't have to

stream is pointer to gpu memory, it means the data there will be there as long as it is not destroyed even after program exit it will be there as long as you don't write too much data there

0 Likes

I tried the following kernel -

kernel void testKernel(int in_stream<>, out int out_stream<>, int first)
{
  int incr = 1;
  if (first == 1)
  {
    out_stream = in_stream;
  }
  out_stream += incr;
  incr += 1;
  return;
}

My calling routine initializes an int array to (0, 1, 2, 3, ...), then does a brook::Stream::read to initialize the input stream. After the first call (with first = 1), the output stream values  are brook::Stream::write to another int array and printed. The values matched the input. On the simulator, each successive call (with first = 0 and no further reads to input stream) yielded an output stream with each value incremented by 1 each time, which indicates that kernel var incr is reset each time.  On the actual FireStream board, the first call returns the correct (copied) stream, but successive calls return a stream filled with 1's.

 

Thoughts? Suggestions for further experiments?

0 Likes

I'm sorry I misunderstood

You're only chance is parent kernel calling child kernel that can access adn modify main memory like in binomial option sample

You see the gpu_backwardTraverse read from mainGPU output stream, then save it to that output stream

That way you could save all state of all thread in main memory

0 Likes

I would like to avoid the cost of transferring all the state information back and forth across the interface. I thought that maybe swapping streams between input and output might do it - like this:

void main()

{

brook::Stream s1<int> ;

brook::Strream s2<int>

int data[1024];

// some initialization cruft omitted

 

testKernel(s1, s2, 1);

// write s2 into local buffer and show the values

// now reverse the streams, since s2 is presumably initialized and has

// values that survived the return

testKernal(s2, s1, 0);

// show the return data from s1

}

 

but this didn't work, and I would like to understand why.

0 Likes

drstrip, avoid that flow of yours. I've done that once, it is slow, very slow. Avoid multiple kernel call

The biggest cost is in the kernel compilation, it is a hell of cost

After analyzing the generated IL code, I found Brook+'s IL kernel compiled at runtime is over 1000 lines, more will be generated if embedding more and more subkernel, I don't know how much it will cost just for long kernel compilation

If your code maintain over a lot of iteration for one long kernel, then the cost should be reduced futher using large multiple input streams

0 Likes

kernel void testKernel(int in_stream<>, out int out_stream<>, int first) {   int incr = 1;   if (first == 1)   {     out_stream = in_stream;   }   out_stream += incr;   incr += 1;   return; }

 

My calling routine initializes an int array to (0, 1, 2, 3, ...), then does a brook::Stream::read to initialize the input stream. After the first call (with first = 1), the output stream values  are brook::Stream::write to another int array and printed. The values matched the input. On the simulator, each successive call (with first = 0 and no further reads to input stream) yielded an output stream with each value incremented by 1 each time, which indicates that kernel var incr is reset each time.  On the actual FireStream board, the first call returns the correct (copied) stream, but successive calls return a stream filled with 1's.

 

 

 

Thoughts? Suggestions for further experiments?

 

Brook+ doesn't support static variables. incr variable declared in your kernel is local variable for each thread and it's lifetime is only one thread.

Probably, you can use LDS as static variable, but again it allows you share data within a thread group only.

0 Likes

You can do this in IL with srX registers.  I guess support for using these registers is not in Brook+ ?

0 Likes

Are LDS or srX supported in Brook? If so, pointers to info would be appreciated.

0 Likes

Are LDS or srX supported in Brook? If so, pointers to info would be appreciated.


LDS is supported, srX is not. You can find information about Thread data sharing in Stream computing user guide section 2.17.

drstrip, avoid that flow of yours. I've done that once, it is slow, very slow. Avoid multiple kernel call

The biggest cost is in the kernel compilation, it is a hell of cost



The kernel compilation cost is paid only once. If you are calling the same kernel again as in this case, Brook+ won't compile the kernel again.

0 Likes

Originally posted by: gaurav.garg

 

The kernel compilation cost is paid only once. If you are calling the same kernel again as in this case, Brook+ won't compile the kernel again.

 

Thanks for pointing that gaurav, I have a hard time explain it in english...

0 Likes