9 Replies Latest reply on May 16, 2011 8:55 AM by himanshu.gautam

    async_work_group_copy

    Meteorhead

      Hi!

      My quick question is: I was told earlier that prefetch is currently not supported. Is it the same with async_work_group_copy? If yes, is it a HW limitation that asynchronous commands cannot be used or is it simply an implementation issue, that it is not yet supported.

      Cheers,

      Máté

        • async_work_group_copy
          nou

           you can use asyn_copy at worst it will be not asynchronous.

            • async_work_group_copy
              Meteorhead

              Although it is not entirely relevant to this topic (although it might even be)...

              It has been rumored, that global sync might become available with 69xx cards. What is the "Global synchronisation registers" block mean on the architecture block diagram? It was already present on the Cypress diagrams.

                • async_work_group_copy
                  DTop

                  What is async_work_group_copy is, in your context?

                    • async_work_group_copy
                      Meteorhead

                      async_work_group_copy is described in the spec pdf. I know global syncing has nothing to do with work_group_copy, but as a matter of fact I wouldn't be surpised if infact on the hardware events used to query async commands would be stored in a location visible to all SIMD engines.

                      I have no clue what those Global sync registers are used for, that's why I asked. And as I have said before, I know it has little to do with the topic name, but it might turn out that async commands and global sync would use same HW element.  That's what I tried to refer to.

                        • async_work_group_copy
                          Meteorhead

                          Let me ask one thing about this issue for clarification:

                          Is out-of-order command queue only a host thread issue? (In my mind it requires device occupation query every time a kernel "wave" finishes and host thread decides which thread to issue from the command queue based upon event dependency and resource usage)

                          Are 5xxx and 6xxx cards capable of async operations at all? (async-work-group-copy and prefetch in particular) Both would be very useful for algorithms that can intelligently cache ahead. I was told prefetch compiles into nop() on GPUs (I do not know about CPUs), but is this lack of support or HW capability?

                            • async_work_group_copy
                              himanshu.gautam

                               

                               

                              Are 5xxx and 6xxx cards capable of async operations at all? (async-work-group-copy and prefetch in particular) Both would be very useful for algorithms that can intelligently cache ahead. I was told prefetch compiles into nop() on GPUs (I do not know about CPUs), but is this lack of support or HW capability?

                              async_work_group_copy should work for Evergreen and NI cards. Do you face any issue while using them?

                              AFAIK out of order command queue execution is still not supported.

                                • async_work_group_copy
                                  Meteorhead

                                  A member of our group experimented with async_work-group_copy on NV cards, and he stated it brought zero speedup. The problem I was coding could have made use prefetch, since data required for next iteration step was known ahead, therefore while one iteration calculates, data for the second could be loaded.

                                  I would try async_work-group_copy for caching too (using shared memory as a cache of VRAM), but that would need serious alteration of the code, and I would hate to do it in vain. If noone has such experiences and could tell me in advance whether it works async or not, all that remains is for me to find the time to test it.

                                  But if AWGC should work really in an async matter, that would be good news indeed, although prefetch would be the best solution, as it does not require explicit memory allocation, just a simple function call, and future memory load will be faster.