15 Replies Latest reply on Jan 4, 2011 8:00 PM by diepchess

    Concurrent kernels

    egonotto
      How can i run more concurrent kernels parallel

      Hi,

      i want run 20 "Monte Carlo Simulation" parallel on an HD5870 Card.

      Because the simulation is complex, with many branches and loops and random behavior, i think one execution pro one stream processor is ok.

      As the HD5870 Card has 20 stream processors, i got 20 parallel runs.

      I hear, that the new 5870 Chip has the ability to run different kernels at the same time.

      But how can i do this.

      Exist a demo to run several kernel at the same time?

      Sorry for my bat englisch.

      Thanks in advanced

      egonotto

        • Concurrent kernels
          wgbljl

           

          Originally posted by: egonotto I hear, that the new 5870 Chip has the ability to run different kernels at the same time.


          Hi, egonotto. Where did you hear that?

          Concurrent executing kernels is one key feature of Nvidia's 'Fermi', while AMD seems does not mention it.

          I am one of the hunters for this new feature.

            • Concurrent kernels
              egonotto

              Hi,

              i have 2 sources.

              In the german ct  (http://www.heise.de/ct/artikel/Fermis-goldene-Regel-811487.html) there is in an article over the new Fermi an sentence:

              "Bei ATIs RV8xx soll nach den Angaben von AMDs Direktor für Stream Computing, Patricia Harrell, ebenfalls die parallele Ausführung möglich sein, in den bislang veröffentlichten Unterlagen findet man zum Thema „concurrent kernels“ allerdings kein Wort, vielleicht ist das Feature bei ATI einfach selbstverständlich"

              The information in ct is from AMD's Direktor for Stream Computing Patricia Harrel. 

              It should be a good soure.

              The other is in an article from internet olso about Fermi  (http://techreport.com/articles.x/17670/2) .

              There is a sentence:

              "(Incidentally, AMD tells us its Cypress chip can also run multiple kernels concurrently on its different SIMDs. In fact, different kernels can be interleaved on one SIMD.) "

              yours sincerely

              egonotto

              • Concurrent kernels
                edward_yang

                It's actually offical that AMD's 5800 cards support concurrent kernel execution:

                http://www.hardocp.com/image.html?image=MTI1NTQ3MDM3NXlvcUhUU1k4TzlfMV8xN19sLmpwZw==

                I find it interesting though that AMD didn't step up to respond to this question. Why would they be hush hush about a superior feature their product has???

                BTW, AFAIK OpenCL makes no assumption to the number of kernels executed concurrently on a device. If the command-queue is in the out-of-order execution mode then the runtime is free to issue multiple kernel commands at the same time (suppose they are not waiting for some event).

                AMD was quiet about this (concurrent kernel) feature probably because their CAL driver doesn't support it yet. However, I believe it is necessary for things like Eyefinity to work.

                 

              • Concurrent kernels
                MicahVillmow
                edward,
                I had not responded because I was attempting to get confirmation on this from some of our hardware engineers, but they have not responded back. But it looks like you already found your answer.
                  • Concurrent kernels
                    st-cyclone

                    so any updates? am very interested on how the r8xx concurrently processes kernels, as a "YES" in a slide isnt enough you know

                    I mean like how many kernels per SIMD (IIRC, fermi does 2), and if the programmer can control such behavior, maybe like egonotto's way, or if there is a more optimized way? if any.

                    sorry for being a "??????", but with all that GFLOPS blazing, one gets very curious. Speaking of curiosity, when can we expect a R8XX ISA reference?

                    regards

                    • Concurrent kernels
                      tomhammo

                      hello Micah.

                      you mentioned you were waiting for confirmation from hardware engineers. any word? I think quite a few of us are wondering not only if 58xx cards, but also - say - 57xx cards, or any other r8xx gpus, support concurrent kernel execution. this would make ATI gpus very attractive as opposed to fermi, especially if even the lower end 57xx cards support it. this is a major selling point for ATI - information on support for concurrent kernel execution - should be easily accessible to developers/system builders :-)

                      regards,

                      - Tom

                      (edited because i misread the name of the Micah)

                        • Concurrent kernels
                          st-cyclone

                          been going through the R7xx ISA, and i suppose its 4 for R8xx. since R7xx has odd and even wavefronts (so 2 on the fly). and RV870 looks like 2 RV770 sticked together, so following the "having 2 of everything in cypress" theme, IMHO suppose (again) the magic number is 4.

                          the only case its still 2 that the thread scheduler was excluded from the X2 theme. 

                           

                          kinda off-topic: its becoming quite amusing to dig for facts in ATi's ISA documents! 

                      • Concurrent kernels
                        MicahVillmow
                        Tomhammo,
                        The hardware is capable of doing so, so is something that we are looking into how to properly utilize. So I don't have much to update at this time.
                          • Concurrent kernels
                            tomhammo

                            Thanks for the info Micah,

                            I guess leading on from that - if I were to recommend purchasing of 57xx vs 58xx gpu's for opencl development - to a customer interested in concurrent kernel execution (due to small kernels) - would there be a difference between support for concurrent kernels between 57xx and 58xx gpus going forward? i.e. if support is enabled in 58xx (via out-of-order command queues in opencl for example) would it also be enabled in 57xx? do 57xx/58xx support concurrent execution in the current version of the opencl drivers? thanks - Tom

                          • Concurrent kernels
                            MicahVillmow
                            Tom,
                            The major difference between the 57XX and 58XX from a compute feature perspective is 58XX has double precision and 57XX only has single precision. For concurrent execution, when one chip gets the feature, they all will get it.
                              • Concurrent kernels
                                tomhammo

                                Hi Micah,

                                We have been working with 5850's, with very good OpenCL results. Though a big question still remains regarding concurrent kernel execution... is there a (rough) estimate of when this feature will be available ?

                                Looking at getting a Fermi board to see whether it is more, or less, cost-effective that 58xx cards - the extra cache would probably not provide much gain for our needs, but concurrent kernel execution might tip it over the edge (e.g. be able to hide the latency of a memory-bound kernel by running it in parallel to a processing-bound kernel).

                                would prefer to keep our investment in ATI-optimised code if we see concurrent-kernel execution down the track

                                thank you and regards,

                                - Tom

                                  • Concurrent kernels
                                    sir.um

                                    bump.

                                    Any news on Concurrent Kernel Execution? A rough due date, problems holding it back, anything?

                                    This would definately be a huge plus, if CKE were "activated" on ATI cards. (Since it is an issue of driver support)

                                    Without CKE, task parallel computation [queue.enqueueTask() - kernels with a workgroup of size 1] have ZERO performance improvement, by running in OpenCL. Since no 2 tasks can run in parallel, and must be run 1 after the other, even if the OpenCL device has more than enough resources to run both kernels.

                                    thanks,
                                    -Chris

                                      • Concurrent kernels
                                        diepchess

                                        Sorry for reviving the thread. But concurrent kernels also very important for me so i join asking the question: when is it available?

                                         

                                        Is currently work getting done to support concurrent kernels?

                                         

                                        Two concurrent kernels would already make life a lot easier and would make programming life a lot easier at AMD gpu's for most problems!

                                         

                                        Sometimes 2 concurrent is easier than n

                                         

                                        Thanks,

                                        Vincent

                                         

                                         

                                         

                                        Originally posted by: sir.um bump.

                                         

                                        Any news on Concurrent Kernel Execution? A rough due date, problems holding it back, anything?

                                         

                                        This would definately be a huge plus, if CKE were "activated" on ATI cards. (Since it is an issue of driver support)

                                         

                                        Without CKE, task parallel computation [queue.enqueueTask() - kernels with a workgroup of size 1] have ZERO performance improvement, by running in OpenCL. Since no 2 tasks can run in parallel, and must be run 1 after the other, even if the OpenCL device has more than enough resources to run both kernels.

                                         

                                        thanks, -Chris