23 Replies Latest reply on Sep 11, 2008 9:46 PM by MicahVillmow

    Features of Stream SDK 1.2?

    oscarbarenys1

      Hi,

      seems that SDK 1.2 is coming this week..

      can someone at AMD elaborate of what new features or examples to expect?

      It has been said to support HD4xxx series and Vista..

      Specifically what about this features:

      1.Interoperability with graphics APIS: In presentations in a Siggraph course Houston showed that DirectX9 and DX10 interop is coming.. in 1.2? also what about OpenGL?   

      2.Shared memory: In AMD H4xxx launch presentations AMD showed that this new gen. cards had improved stream features among others shared memory which seem similar to Nvidia CUDA shmem.. Will be exposed in 1.2?.. In CAL only or in Brook also..

      3.Sinchronization primitives. As in CUDA.. seem to be apropiate once shared mem is exposed..

      4. Atomic intrustions for accesing memory..

      In my humble opinion (Sorry for saying so clear)  this three features seem to

      be three major areas in which CUDA is superior to your SDK for computing..

      (albeit it all now seem software deficiencies excepting perhaps atomic instructions)..

      I want to express this as constructive opinions and I also am aware that your hardware for example for double precision computing is better than competition solutions..

       

      Thanks,

      Oscar.

       

       

       

       

        • Features of Stream SDK 1.2?
          ryta1203
          Yes, CUDA is superior to Firestream at this point.

          The one thing that really stands out to me is how Nvidia can have a ~40 page document that one can read and begin programming in CUDA immediately with little to no questions while AMD has all this documentation and it's virtually useless. Too many companies these days under emphasize documentation and the importance of it, IMO.
            • Features of Stream SDK 1.2?
              xtremeleo

               

              Originally posted by: ryta1203 Yes, CUDA is superior to Firestream at this point. The one thing that really stands out to me is how Nvidia can have a ~40 page document that one can read and begin programming in CUDA immediately with little to no questions while AMD has all this documentation and it's virtually useless. Too many companies these days under emphasize documentation and the importance of it, IMO.


              Can you refer me please to that documentation?

              Is just that I'm in high school and have virtually no resources for learning, can you help me a bit?

              For example, If I can read and understand the documentation, would I be able to make little apps? I'm not specting to write a real time ray tracer engine, but simple programs.

            • Features of Stream SDK 1.2?
              Ceq
              Well, I think Brook has a great programming model, cleaner and easier to learn than CUDA, however it's
              true that it is still in development and documentation is quite poor. Shared memory access would be a
              really powerful feature, but I think there are a few things to be fixed before that, like in kernel array support.
                • Features of Stream SDK 1.2?
                  ryta1203
                  Ceq,

                  Although I disagree that Brook+ is cleaner and easier to use than CUDA (I have no idea how you came to this conclusion), even if Brook+ was the best ever it still is no good if the company can't get across to developers how to develop using it. Nvidia is light years ahead of AMD in this regard.
                    • Features of Stream SDK 1.2?
                      traits

                      IMO, the doc is a big problem

                        • Features of Stream SDK 1.2?
                          ryta1203
                          Originally posted by: traits

                          IMO, the doc is a big problem


                          Unfortunately, looking at the v1.2 documentation, the Brook+ documentation has NOT changed at all and is still VERY VERY poor. In fact, the "-A" is still in the sample VS2005 window screenshot from v1.0 Beta. The "new"scatter sampe isn't even included in this version. It looks like AMD didn't even touch anything having to do with Brook+, which for a company that asserts "you should use Brook+ and only use CAL when you have to really fine tune" this doesn't make sense.

                          From the release notes it looks like little has changed in the way of Brook+, which again, is very disappointing.

                          So when can we expect another update and will there be any additions to Brook+?

                          From page C-1 are the 3850s not supported?

                          EDIT: To be severely blunt, the documentation looks like a 5th grader wrote it. The same things are said over and over without going into much detail or depth.

                          More so, it seems now that chaining streams through kernels is impossible, is that true? If so, why? All this does is create more reading from main memory and more streams to be created, really decreasing efficiency of the GPU computational power.

                          So in order to get a virtual bidirectional stream you have to:

                          1. create two streams, same size
                          2. pass the one as input the other as output to one kernel
                          3. create a kernel to transfer the data between them
                          4. go to step 2

                          Does that make sense? Is there an easier way to do this?

                          Brook+ code:

                          stream one<10>;
                          stream two<10>;

                          streamRead one(one, c_one);
                          kernel1(one, two);
                          trans_kernel(two, one);
                          kernel2(one, two);
                          etc.
                          etc.

                          You now have to create an extra kernel to transfer the data from 1 kernel to the next?
                            • Features of Stream SDK 1.2?
                              MicahVillmow
                              Ryta,
                              The currently agreed upon best way to do what you want to do is to ping-pong buffers back and forth. This is what is done in bitonic_sort and a couple of other samples.

                              For example using your pseudo code it would look like this:
                              streamRead(one, c_in);
                              kernel1(one, two);
                              kernel2(two, one);
                              streamWrite(one, c_out);

                              This works if the streams are of the same size and type.
                                • Features of Stream SDK 1.2?
                                  ryta1203
                                  Micah,

                                  Thanks. I guess it's a little counter-intuitive to have two of the exact same array. It seems that this might effect memory size issues, having to have duplicates for every array that you wanted to use further reducing the application usage of this SDK, right?
                        • Features of Stream SDK 1.2?
                          Ceq
                          To xtremeleo:

                          To get the documentation just download the SDK and install it, it is inside a folder named 'doc' in your Brook+ directory.
                          You will also find a 'samples' folder:
                          Inside 'tests' there are some recommended examples to get used to Brook+ syntax
                          Inside 'samples' there are some more advanced programs to show Brook+ capabilities.

                          There is something like a quick language description in the original brookgpu project homepage, Brook and Brook+ are nearly identical:
                          http://www-graphics.stanford.e...cts/brookgpu/lang.html
                          • Features of Stream SDK 1.2?
                            rahulgarg
                            ftp://streamcomputing:streamcomputing@ftp-developer.amd.com/AMD_Stream_SDK/
                            However I dont think this is an official release. Probably a pre-beta and not probably intended for public consumption.
                            From the documentation I see support for local data share, dx9/10 interop and even some sync primitives in CAL. I dont use Brook+ so dont know whats new there.
                            • Features of Stream SDK 1.2?
                              MicahVillmow
                              Ryta,
                              This is one of the constraints with the stream programming model. The input streams and output streams are mutually exclusive in order to guarantee the property of streams. This programming model also maps very well to the underlying hardware since the graphics cards are in essence streaming processors with a memory model that has distinct input and output memory address spaces. The downside of this is that trying to map algorithms that assume a uniform memory space is a little more difficult that require use of out-of-place techniques. The example given previously, bitonic sorting, is one such problem. Instead of sorting only in one data array, the output is written to a second stream which is used as the input stream in the next pass.

                              If there was a uniform memory space, then using the same stream for input and output would cause no problems, but that is not the case with graphic cards.
                                • Features of Stream SDK 1.2?
                                  ryta1203
                                  Micah,

                                  Just for the sake of accuracy, I think we should specify that "that is not the case with AMD's graphics cards".

                                  It seems that either Nvidia has a unified memory space or at least their CUDA API treats it as such, either way as a developer it doesn't matter, I can program it as a unified memory space.

                                  Since AMD went more traditional/static with their API I guess the ping pong effect is like having a double buffer that you have to swap.
                                    • Features of Stream SDK 1.2?
                                      udeepta@amd

                                      From the Brook+ release notes:

                                      o The runtime now enforces the rule given in the language spec
                                        that makes it illegal to bind the same stream for both read
                                        and write. For compatibility with existing code, this type
                                        of aliasing is permitted if the environment variable
                                        BRT_PERMIT_READ_WRITE_ALIASING is defined.

                                        • Features of Stream SDK 1.2?
                                          ryta1203
                                          Originally posted by: udeepta@amd

                                          From the Brook+ release notes:




                                          o The runtime now enforces the rule given in the language spec
                                            that makes it illegal to bind the same stream for both read
                                            and write. For compatibility with existing code, this type
                                            of aliasing is permitted if the environment variable
                                            BRT_PERMIT_READ_WRITE_ALIASING is defined.





                                          So is there some reason to not have this environment variable defined? I saw this in the release notes but not in the documentation (aka, the programming guide).
                                      • Features of Stream SDK 1.2?
                                        eduardoschardong
                                        MicahVillmow,
                                        I understand the problems of read/write streams when gathering and scattering data, even on CUDA this feature is bogus, but how how about non-gather/scatter read/write streams? Each thread will read and write only to it's own element, it could even be easily abstracted by the compiler if hardware does not support it...
                                        For example:
                                        kernel normalize (inout float<> a)
                                        {
                                        float n = a;
                                        //...do anything on n
                                        a = n;
                                        }

                                        What's would be the problem with this?
                                      • Features of Stream SDK 1.2?
                                        MicahVillmow
                                        Eduardo,
                                        There actually isn't any problem with that and also with scattering/gathering it is possible, and in the past it was allowed, but we have made the compiler more strict in following the spec. However as Udeepta put in an earlier thread, setting BRT_PERMIT_READ_WRITE_ALIASING should allow the aliasing between the read and write streams.

                                        The one caveat, if you set the R/W aliasing flag, you must guarantee that the writes from one invocation of the kernel do not step on the reads of another invocation of the kernel.