20 Replies Latest reply on Dec 3, 2008 3:22 PM by bayoumi

    GPUwiki.com

    Methylene
      A new wiki for GPGPU / Shader Pipeline Programming and more!

      I just launched this website and finished ironing out the last of the setup.  Please don't expect much from it right now, the major purpose of its existance right now is to try and get people to start chronicling their experiences in a descriptive manner so as to help others.

      I'm asking that not only those with experience in using the Stream SDK contribute, but with experience with GPUs in general.  I'm hoping to conglomerate the knowledge of all the GPU programmers out there, and provide the resource I've seen so many requests for.

      Once again, due to my limited knowledge on the subject, and my even more limited time, this site has barely been setup.

      This is more like my donation back to the community for its help in the past!

      I'd very much like to see some contributions from the AMD team.  However, please anyone feel free to contribute, I do not have too much time anymore myself for these things.

        • GPUwiki.com
          ryta1203
          I think it's important for you to at least have an External Link to these two sites that are already up:

          http://en.wikipedia.org/wiki/GPGPU

          http://www.gpgpu.org/w/index.php/Main_Page
            • GPUwiki.com
              Methylene

              noted and I'll add both, but remember, it's a wiki, if you feel it's important, add it!

                • GPUwiki.com
                  Methylene

                  I'd also like to emphasize that due to my inexperience with assembly language, I really need some help especially with AMD IL articles and tutorials.  Also admittedly I've barely touched CAL, although I believe it is well within my comprehension if I were to read over the documentation, but I'd much rather those with more experience contribute.

                  Even my Brook+ knowledge is somewhat rudimentary!

                  I've seen and heard of a lot of success with the SDK so far, and I look forward to seeing all of your contributions!

                    • GPUwiki.com
                      ryta1203
                      Originally posted by: Methylene



                      I've seen and heard of a lot of success with the SDK so far, and I look forward to seeing all of your contributions!



                      I'm interested to hear these stories. Most of what I have heard is that people who have tried working with this SDK have quit and have temporarily put it on the back burner for now due to several issues: documentation, brook+ lacking fundamental coding aspects, etc...

                        • GPUwiki.com
                          Methylene

                          http://forums.amd.com/devforum/messageview.cfm?catid=328&threadid=103149&enterthread=y

                          Ryta, you seem to hold a grudge against a fledgling program... Lets bear in mind that the SDK is indeed newer than CUDA.

                          Also, I have been in contact with SDK team members about the bugs I have encountered and there was thorough testing done to insure that the issues will indeed be fixed by the next release.

                          It would seem to me that the NVIDIA side has a lot more followers, so needless to say it will be for now more successful.  However if everyone that ever tried the BETA SDK found it didn't work in some way shape or form and then abandoned ship and ran to NVIDIA who would test and help bring the bugs to the attention of the staff?

                          I've found the team to be very interested in the community's input, although sometimes, but not all too often, unable to provide the answers I was looking for.

                          I'd just like to emphasize the point that a good project can not ever become a good project without a good community.  If you like AMD, and you want them to pull ahead, then it's our responsibility as the consumers to give them input on the problems and help by suggesting solutions that would work for us.

                          I remember making a comment a while back about the support of 4870x2 devices.  They are now supported with the new driver, and expose 2 CAL devices.  That's progress isn't it?

                          But who am I to stop you from trolling these forums and making harsh commentary on every glitch along the way.  I'm just an individual that sees my own responsibility in developing a product that I can be satisfied with.  Rather than just expecting a lower-budget company to have the almost inexhaustible test crew that NVIDIA must have to be able to constantly provide products and fix the bugs for their consumer base of people who could give a damn about reporting the bugs they experience and just always sit around with their fingers crossed that it will be fixed for next time.

                          I just don't think AMD has the financial capability or manpower to find and fix every bug on their own.  That's why things like the Stream SDK are probably exposed to the public early before their official release so as to attract a testing community.

                          I'm just a little sick and tired of the irresponsible attitude I find in my peers these days.  If you believe something should be a certain way, then you should politely attempt to work with the people required to make it that way.

                          I have been constantly explaining my experiences and suggesting ways to improve things to AMD, and I have been greeted with compassion on the matter.  It makes it hard to have a grave attitude (as I first did when I came on these forums) when there are people in the company working so damn hard to fix things.

                          I also see a lot of unreasonable requests and expectations from other areas of the community.  For instance there is a lot of flaming going on about how AMD has not managed to release the last documentation required for the RadeonHD crew to implement 3D acceleration in their driver.  I'd just like to emphasize that NVIDIA has not yet open-sourced one bit of their driver.  So whether it takes a year or a thousand years they're still miles ahead of the competition in terms of an open sourced driver.

                          I have in the past year become a total open-source convert.  To the point where I feel guilty whenever I consider starting a project that I will not be releasing back to the community, and usually find a way to make some if not all of the project available (IE: modularizing a part that would not necessarily need to be proprietary).  As far as my noise stuff goes that is what I plan on doing (when I get the time to start working on it again!).

                          At any rate, I just wish I could see more people around that think before they bare grudges, that consider the company's position in a capitalistic society, in a competitive industry where a single secret that gets out the door can mean the end of that company's future.

                          So buck up and get in the game, you seem to be fairly interested in having AMDs solutions work for you, so how about you shoot an email over to streamcomputing@amd.com and let them know what didn't work, how it went wrong, and what you'd like to see in the future.

                          EDIT:  I'd also like to see all these instances where people said things weren't working for them and that they'd have to abandon the project... Because AFAIK there aren't many more than yourself.

                          I read a post you started earlier where jean-claud responded that he didn't see much of a performance increase... His kernel used a float data type...  With all these complaints about the documentation I'm beginning to wonder if the complaintants have even read the documentation.  As the first thing the Brook+ section really gets into is about how the float4 is the most optimum way to use the hardware as it is optimized for 2D texture data (4 floating point numbers).

                          I wouldn't expect much of an improvement when 1 float is being processed at a time.  This causes 3 processors to idle (as mentioned guess where, the documentation).

                          As far as real world experience goes.  I worked with libnoise before writing my own kernels based on their code.  Libnoise is pretty quick because it uses a random index into a table of floating point values, thus reducing the number of FLOPs required to acquire the result.  However, I am finding my equations to run very fast, and there is an OBVIOUS increase based on the parallel operation.

                          For instance, it takes me the **same** amount of time to calculate 1 million noise values as it does to calculate 2 million, because of the fact that obviously 1 million or 2 million both can be done in 1 "pass" (my kernel apparently has a couple passes or so but it does not need to be split into 2 seperate chunks of data because there is room on the card).

                          The proof is in the pudding.  I've not used CUDA, but I am laying the ground work for another developer to add a CUDA optimized backend to my some-day-to-be open-sourced GPU noise library.  However my main goal will be to make the library available to all regardless of OS or hardware, and OpenCL will be the way to go for me in that light.

                          Just remember AFAIK CUDA is NOT beta software.  So how can you expect BETA software to work without bugs or hitches as well as NONBETA software would?  And to return to my original point, how can you expect it ever to get there if you spit in the developers faces all the time saying their project sucks and will never be able to compare to the competition?

                            • GPUwiki.com
                              yakktr

                              Methylene

                              I really appreciate what you are doing  and totally agree with what you have said above. I have no knowledge of Brook, I started with CAL despite the advised way is to learn Brook first. I have only studied the CAL example given in documentation and played on it with different kernels so far.

                              I gave it a break for various reasons but I will return to work on it in 2 weeks (hopefully) . A strategy I may suggest for beginners like me is to use GSA (GPU Shader Analyser). When you convert BROOK code to IL, even a simple function like an addition results a loooooong kernel code. But when you use GLSL it just gives plain IL code which can be easily followed. I write some simple kernels with loops etc using this method. Next I am planning to work on ode solving possibly with Runge-Kutta method. I would be more than happy to share with community and see others work

                              Cheers

                                • GPUwiki.com
                                  ryta1203
                                  Methlyene,

                                  I have no personal investment in the SDK and certainly have no personal grudge against the SDK. I'm not sure why you would ever suggest such a thing. I merely stated facts: that I have left it (and others I know, who don't frequent this board) due to problems, lack of support, lack of documentation, lack of features, etc. There is nothing personal about this.

                                  I simply honestly don't believe that AMD has full support behind Brook+. None of the posters from AMD here seem to know much about Brook+ (Micah is the only one who really posts here anymore and he is self-admittedly a CAL guy). AMD is looking to support OpenCL (from what I have read) so I doubt that Brook+ will be around too much longer. More evidence of this comes from the fact that AMD has outsourced the Brook+ development. I don't wish to code in IL (essentially assembly) due to it's long development times compared to it's performance gains with other GPGPU solutions (aka CUDA).

                                  These are facts, not some personal grudge of which you accuse me. Don't drama it up, it's just an SDK and facts.
                                • GPUwiki.com
                                  ahu

                                   

                                  Originally posted by: Methylene
                                  I remember making a comment a while back about the support of 4870x2 devices.  They are now supported with the new driver, and expose 2 CAL devices.  That's progress isn't it?


                                  Which driver / OS? With the 8.11 driver I see no progress here.

                                  Under Vista, only one CAL device is seen (though there are supposedly register hacks to disable the internal Crossfire, thus enabling two devices).

                                  Under XP and Linux, exposing two devices has been possible before but there have been several issues.

                                    • GPUwiki.com
                                      ryta1203
                                      1. I'm surprised you were even able to get 8.11 working. Most are experiences some kind of atikmdag.sys BSOD error with 8.11. I'm currently using 8.10 because of this major bug.

                                      2. This link: http://www.supercomputingonlin.../article.php?sid=16532 posted by Micah in another thread indicates what I have been saying, that AMD is moving away from Brook+ and toward OpenCL. How much longer will Brook+ be supported? I don't know, but this is probably the major reason why they are not interested in making Brook+ any better and instead are focusing more on CAL.

                                      EDIT: The real question becomes: will you be able to achieve the same performance levels in OpenCL across multiple SDKs (CUDA, Firestream, Cell, Larrabee, etc..)? And how long after the spec is released will implementations be released? I didn't attend SC08 unfortunately.
                          • GPUwiki.com
                            MicahVillmow
                            Ryta,
                            The link explicitly states that there are significant improvements to Brook+ in the next release. It covers all aspects of the Brook+ software stack. The next release of the SDK will hopefully solve the problems with the Driver versions causing problems.
                            As for OpenCL, it can be thought of as similar to OpenGL but for computation instead of graphics.

                            On a side note, because I am mainly working with the SDK below the Brook+ level, I don't post to most Brook+ related questions for the reason that I do not use it. This might give the impression that we are focused solely on CAL, but that is entirely not the case. Brook+ is an integral part of our software stack. The Brook+ Streaming model is a viable high level language that maps to graphics hardware very well as long as the streaming model constraints are maintained.
                              • GPUwiki.com
                                ryta1203
                                Micah,

                                I see. So will we be seeing any new features in Brook+ anytime soon? What will happen to Brook+ once OpenCL is released? What would be the reason to even use Brook+ once OpenCL is released?

                                From the limited presentation I have seen of OpenCL it appears much more robust than Brook+, so will that limit it's performance?

                                Is there not any Brook+ people who are on this forum because AMD has outsourced Brook+?

                                Sorry I have a lot of questions.
                              • GPUwiki.com
                                MicahVillmow
                                Ryta,
                                As mentioned in an earlier post, local buffers did not make it into this release, but should be in a future release. I really can't get any more specific than that in regards to new features.
                                Brook+ should still exist as part of our software stack as our high level language. OpenGL is a lower level API than Brook+ so OpenCL will probably be very similiar.
                                As to why use Brook+ over OpenCL? Well, there are graphics libraries that are built on top of OpenGL, so I could see HLL's being built on top of OpenCL very much in the same vein as Brook+ is built on top of CAL. Brook+ is a streaming model, so any task that fits the streaming model could still be coded in Brook+ very easily. It's all down to abstraction and ease of use. The higher in the software stack you go, the more abstraction you have, the lower the less. Various users want various levels of abstraction and it is one of AMD's goals to provide a full robust software environment for developers.

                                Also, the aren't many Brook+ people on this forum because they are busy working. They do however keep watch on the forum and pick up the bug reports and feature requests.


                                  • GPUwiki.com
                                    Methylene

                                    Ryta... The only reason I said the things I did has to do with the fact that you constantly spread misinformed opinionated AMD disappointment.  You are mongering it whether you admit to it or not...

                                    Regardless I'd like to return to my above point about the open community.  A very important point was made to me a few weeks ago... If I don't like Brook+ or think it should work differently... Guess what... It's... who would have thought it... OPEN SOURCED.

                                    Go dive into the code and make it what you want... And then wait... this is a novel concept.  Submit diffs to AMD and let them see your changes and quite possibly work them into the   official code in a way that makes sense.

                                    Brook+ is yours if you want it, however you want it.  You just gotta put some time and effort into reviewing the code in <BROOK+DIR>/platform.

                                    Micah thanks for your commentary on the matter, I think that AMD is doing a great job, and I am behind you guys more than ever.  I look forward to the new SDK release and will certainly be... eventually... working on a demonstration that will allow comparison of  the various GPGPU methods for my purposes.

                                      • GPUwiki.com
                                        ryta1203
                                        Originally posted by: Methylene

                                        Ryta... The only reason I said the things I did has to do with the fact that you constantly spread misinformed opinionated AMD disappointment.  You are mongering it whether you admit to it or not...




                                        I'm very interested to hear what kind of misinformation you think I am "constantly spreading"? Even though this is not the place for those kinds of insults.

                                        I'm not "mongering" anything. I'm giving my personal opinion on an internet forum. Where else would you like me to post my personal opinion?

                                        As far as Brook+ being a useless language, it's true that some have some results. My contention is that if you coded the same applications in a better SDK such as CUDA you would get better results with less development time. Why would you prefer to use a bad tool for a job when a much better one is available, hence Brook+ is essentially useless at the moment.


                                        Either way, at some point OpenCL will be released (very soon I believe since this was announced at SC08) and, for a lot of applications, the advantages of platform independence will most likely outweigh the need for shaving off another second or two of computation time.
                                    • GPUwiki.com
                                      udeepta@amd

                                       

                                      Originally posted by: Methylene  the major purpose of its existance right now is to try and get people to start chronicling their experiences in a descriptive manner so as to help others.

                                      A lot of folks would find this very helpful -- great job.

                                      • GPUwiki.com
                                        bayoumi
                                        Ryta123,
                                        I have tried for a while moving to CAL to implement a global array loop with thread sync, cache-memory coherence & in-out at the same global memory location. The finding was that Brook+ & CAL gave me exactly the same overheads & throughput. Now that I understand some of the reasons that some features such as the one asked (global in-out arrays) where not yet implemented. I went back to Brook since I see Brook & CAL as the same thing, and left CAL to deal with the impossible tasks.
                                        I am just disputing the fact the Brook+ is a useless language. I am working on an industry related IP, so it is not easy to just disclose my numbers or applications. In summary, I can do everything I want in Brook+. I might have to pay some performance penalty for my wish-list. IL is great to understand the low level details. Sometimes, I was glad my wish list did come true, because I would have given me tools to implement wrong solutions.
                                        I also believe we are all working on a new technology, and we all take the risk that the road might be blocked at any point, and this is R&D! The positive side of the fact that AMD's tools are still maturing, is that we get personalized attention. This is something we need to enjoy while it lasts ...
                                        In fact I admire the professional response (and language) we get here. I have seen on some ultra giant corporation help forums examples of moderators loosing their patience in a very unacceptable way.
                                        You will remember this when our forum turns to help line, and you will be issued "Service ticket" for any response.
                                        Thanks
                                        Amr
                                        • GPUwiki.com
                                          bayoumi
                                          Just to add, I do not care if AMD comes with a new language everyday. I make a parallel programming model/algorithm, and I think all languages will just differ on the outer shell. Our kernels are not 10s of thousands of lines that will takes years to rewrite. Give me a good cup of fresh heavy coffee, and everything else can be solved
                                          thanks
                                          Amr
                                            • GPUwiki.com
                                              Methylene

                                              Gee bayoumi, it's hard sometimes.. But if I keep reading I can stay with your train of though lol!  Hey man, great words, I whole heartedly agree with what you say.  We're test dummies on the road to GPGPU/Parallel Programming success.  I too have had very personalized help and I am very greatful for it.  When the bugs were encountered we took our heads and stuck em to the task of pinpointing the critter that caused it.

                                              So on this fine Turkey Day here in the states, I'd like to say I'm thankful for the help and kindness I've experienced from the AMD team.

                                              To further your point about a good cup of coffee.. Although I'm on a temporary haiyatus from coding, my last efforts were toward abstracting away the differences that will be found between the various APIs.  It will be a matter of merely calling a function to take care of the Stream1 -> Stream2 dirty work, and then I can have my information returned to me in a clean manner.  Furthermore, as I've suggested before, it will allow me to do comparisons of the performance of each language/API.  Eventually I hope to implement a GUI to merely select your backend and push a button and thusly see the difference in performance between the APIs.  And no need to drop it off as a science experiment, as I am planning on open-sourcing my GPU noise library as soon as I am confident it is organized and stable enough to release to the public.  Part of the reasoning will be that, for the time being, I have absolutely no ability to write programs with CUDA so that backend cannot be designed by me.

                                              At any rate, I reached a road block myself a while back.  The one bug with mapping a kernel after it has left the scope of its original mapping... The bug isn't in older SDK versions, but in order for my system to be fully supported (and obviously perform orders of magnitude better) I upgraded to the 2.26.27 kernel with Ubuntu 8.10, and therefor I must wait for 1.3 to fix the bug lest I want to revert everything again (which of course I don't).

                                              The roadblock originally stopped me from making what was going to be a basic demo of my work where you could switch the tiles with a push of the button... Now that I already have made it to a 3-Dimensional stage, it's an irrelevant feature, but a demonstration I can make that is even cooler will be one with a simple gui to allow changing of the input values or noise types on the fly.  So yea I've hit a road block, but it's no big deal, I have to wait for OpenCL to truly get started on what I want to do with this stuff anyway.

                                              My only qualm atm is that it seems like CUDA already implements interoperability for OpenGL with mapping and unmapping of buffer objects.  I've seen all sorts of hints of this but as far as I could tell there's no real interoperability.  Hence my moving to OpenCL which will also provide me the ability to develop without reference to what hardware it will be run on (thank you Khronos group you life savers!).  At any rate, the reason I mention this is, once open-sourced, obviously CUDA will gain an edge, over at the very least the CAL or Brook backends for my library.  AFAIK there is nothing I can do to avoid transferring from system memory, to graphics memory, to system, and back, with the Stream SDK.  CUDA and OpenCL both will make this possible.

                                              If AMD is keeping these projects as their optimized solution for computing.  I highly recommend something be done to correct the sitatuation (if it is not just my mere misunderstanding of the mechanics).  As it is quite obvious that faster processing of data goes hand in hand with faster presentation of results!  I mean I realize there seems to be the interoperability of dx9 / d3d10, but really why be selling OpenGL short, just because it's supported in OpenCL doesn't mean CAL shouldn't support it.  Like I said it will prevent the SDK from performing as well in those circumstances!

                                              Also, I really do feel this SDK should provide all the same options at all the same levels of interface, except when it is impossible.  I realize brook is open sourced, so I suppose I can do some work myself if it's what I truly desire, however, I'd just like to make the point that, if I chose brook+ for it's high-level interface I should be able to make a simple call to use the interoperability extensions to acquire a stream from a mapped object.  Or maybe be able to query CAL with a simple call to find and store hardware information simply.

                                              Just some considerations.  Otherwise I have no qualms using it as is, and trying to demonstrate the abilities of the SDK regardless.  Like I said I'm moving to primarily work with OpenCL, just that I'd like to really show that the Stream SDK is an optimized solution, in all categories.

                                            • GPUwiki.com
                                              bayoumi
                                              Methylene
                                              I believe that sooner or later we should expect a change to unify languages as much as possible. I do not even exclude an IEEE standard. It could end up being Microsoft in the end who puts the standard. The CUDA & Brook+ situation is similar to the Verilog & VHDL situation in digital VLSI.
                                              My point is the future is not clear.
                                              I also agree moving between CPU & GPU memory is the biggest bottleneck.
                                              Good luck with what you're trying to do. This is really ambitious
                                              regards
                                              Amr
                                              • GPUwiki.com
                                                bayoumi
                                                ryta123,
                                                I have made my choice because ATI-AMD was the only GPU in the market with hardware double precision & PCI-E Gen2, one year ago when I started my work. The next double precision GPU came at least 2 quarters later
                                                Regards
                                                Amr