10 Replies Latest reply on Dec 3, 2009 6:37 AM by riza.guntur

    Persistent Kernel Memory?

    drstrip

      Is there any way to store values in a kernel between invocations? I'm looking at a large array computation that I would like to periodically interrupt to report progress back to the caller, but would prefer not to have to send all the data and state information back and forth every time. Brook+ doesn't support static vars, if i understand correctly, but is there some other way to persist state between kernel calls? I think OpenCL supports this, so it's not a hardware issue.

        • Persistent Kernel Memory?
          riza.guntur

          you don't have to

          stream is pointer to gpu memory, it means the data there will be there as long as it is not destroyed even after program exit it will be there as long as you don't write too much data there

            • Persistent Kernel Memory?
              drstrip

              I tried the following kernel -

              kernel void testKernel(int in_stream<>, out int out_stream<>, int first)
              {
                int incr = 1;
                if (first == 1)
                {
                  out_stream = in_stream;
                }
                out_stream += incr;
                incr += 1;
                return;
              }

              My calling routine initializes an int array to (0, 1, 2, 3, ...), then does a brook::Stream::read to initialize the input stream. After the first call (with first = 1), the output stream values  are brook::Stream::write to another int array and printed. The values matched the input. On the simulator, each successive call (with first = 0 and no further reads to input stream) yielded an output stream with each value incremented by 1 each time, which indicates that kernel var incr is reset each time.  On the actual FireStream board, the first call returns the correct (copied) stream, but successive calls return a stream filled with 1's.

               

              Thoughts? Suggestions for further experiments?

                • Persistent Kernel Memory?
                  riza.guntur

                  I'm sorry I misunderstood

                  You're only chance is parent kernel calling child kernel that can access adn modify main memory like in binomial option sample

                  You see the gpu_backwardTraverse read from mainGPU output stream, then save it to that output stream

                  That way you could save all state of all thread in main memory

                    • Persistent Kernel Memory?
                      drstrip

                      I would like to avoid the cost of transferring all the state information back and forth across the interface. I thought that maybe swapping streams between input and output might do it - like this:

                      void main()

                      {

                      brook::Stream s1<int> ;

                      brook::Strream s2<int>

                      int data[1024];

                      // some initialization cruft omitted

                       

                      testKernel(s1, s2, 1);

                      // write s2 into local buffer and show the values

                      // now reverse the streams, since s2 is presumably initialized and has

                      // values that survived the return

                      testKernal(s2, s1, 0);

                      // show the return data from s1

                      }

                       

                      but this didn't work, and I would like to understand why.

                        • Persistent Kernel Memory?
                          riza.guntur

                          drstrip, avoid that flow of yours. I've done that once, it is slow, very slow. Avoid multiple kernel call

                          The biggest cost is in the kernel compilation, it is a hell of cost

                          After analyzing the generated IL code, I found Brook+'s IL kernel compiled at runtime is over 1000 lines, more will be generated if embedding more and more subkernel, I don't know how much it will cost just for long kernel compilation

                          If your code maintain over a lot of iteration for one long kernel, then the cost should be reduced futher using large multiple input streams

                      • Persistent Kernel Memory?
                        gaurav.garg

                         

                        kernel void testKernel(int in_stream<>, out int out_stream<>, int first) {   int incr = 1;   if (first == 1)   {     out_stream = in_stream;   }   out_stream += incr;   incr += 1;   return; }

                         

                        My calling routine initializes an int array to (0, 1, 2, 3, ...), then does a brook::Stream::read to initialize the input stream. After the first call (with first = 1), the output stream values  are brook::Stream::write to another int array and printed. The values matched the input. On the simulator, each successive call (with first = 0 and no further reads to input stream) yielded an output stream with each value incremented by 1 each time, which indicates that kernel var incr is reset each time.  On the actual FireStream board, the first call returns the correct (copied) stream, but successive calls return a stream filled with 1's.

                         

                         

                         

                        Thoughts? Suggestions for further experiments?

                         

                        Brook+ doesn't support static variables. incr variable declared in your kernel is local variable for each thread and it's lifetime is only one thread.

                        Probably, you can use LDS as static variable, but again it allows you share data within a thread group only.

                          • Persistent Kernel Memory?
                            emuller

                            You can do this in IL with srX registers.  I guess support for using these registers is not in Brook+ ?

                              • Persistent Kernel Memory?
                                drstrip

                                Are LDS or srX supported in Brook? If so, pointers to info would be appreciated.

                                  • Persistent Kernel Memory?
                                    gaurav.garg

                                     

                                    Are LDS or srX supported in Brook? If so, pointers to info would be appreciated.


                                    LDS is supported, srX is not. You can find information about Thread data sharing in Stream computing user guide section 2.17.

                                     

                                    drstrip, avoid that flow of yours. I've done that once, it is slow, very slow. Avoid multiple kernel call

                                    The biggest cost is in the kernel compilation, it is a hell of cost



                                    The kernel compilation cost is paid only once. If you are calling the same kernel again as in this case, Brook+ won't compile the kernel again.

                                      • Persistent Kernel Memory?
                                        riza.guntur

                                         

                                        Originally posted by: gaurav.garg

                                         

                                        The kernel compilation cost is paid only once. If you are calling the same kernel again as in this case, Brook+ won't compile the kernel again.

                                         

                                        Thanks for pointing that gaurav, I have a hard time explain it in english...