5 Replies Latest reply on Oct 5, 2017 1:00 PM by sp314

    Getting a global barrier to work on R290X

    sp314

      Hi there,

       

      I'm trying to get global barriers to work on an R290x (Hawaii), and they're not working for the reason I do not understand. I'm posting in hopes of getting an advice, because I'm out of ideas.

       

      First, the first wave to come in initializes the barrier, like so (I'm trying it out with only one wave at the moment)

       

      /*000000000074: 7e0002c0         */ v_mov_b32       v0, 1

      /*000000000078: d8660000 00000000*/ ds_gws_init     v0 gds

      /*000000000080: bf8c007f         */ s_waitcnt       lgkmcnt(0)

       

      To detect the first wave, I'm using an atomic add, as described in this thread

      https://community.amd.com/thread/165710

       

      After that, all threads (intentionally) waste some cycles with either a bunch of memory writes or multiple s_nop instructions, since I remember seeing a post somewhere that says that the barrier must be initializes a few hundred cycles before use. Not sure if it is necessary or not, but still, I put the delay in just in case.

       

      Afterwards, all threads wait on the barrier, like so

       

      /*0000000001e8: 7e000280         */ v_mov_b32       v0, 1

      /*0000000001ec: d8760000 00000000*/ ds_gws_barrier  v0 gds

      /*0000000001f4: bf8c007f         */ s_waitcnt       lgkmcnt(0)

      and then just exit.

       

      I've tried multiple values for the wave counts, setting v0 (see above code) to the number of waves, the number of threads, etc. The code seems to get stuck on ds_gws_barrier, until the driver resets the videocard. I have verified that the instruction encoding generated by CLRadeonExtender is correct by comparing the machine code I'm getting to the code I should be getting according the Southern Islands ISA doc (found here https://developer.amd.com/wordpress/media/2012/12/AMD_Southern_Islands_Instruction_Set_Architecture.pdf).

       

      As some other threads suggest, I'm setting m0 to 0x1000, but again, I've tried different values here too and the code still locks up.

       

      I'm on Windows 10 with the latest available drivers from AMD, if that matters. What other info do I need to provide to get better answers, and what would you suggest I should try?

       

      What am I doing wrong?

       

      Thank you!

        • Re: Getting a global barrier to work on R290X
          drallan

          Hi sp314,

           

          I use global barriers often and think you may be referencing one of my old messages. Just a couple of quick questions.

          How many waves do you  enqueue and how many waves are in the barrier? The general rule seens to be:

           

          You must enqueue a small enough number of waves that  all can run simultaneously, none can be waiting.  The number of  barrier waves must be enqueued waves - 1. On 290X the number of enqueued waves might be 4*44 = 352 and barrier waves 351.  I assume 2 and 1 would also work but I've not tried.

           

          Using the atomic_add method ensures the first wave (in time) executes ds_gws_init. This ensures no waves pass the barrier before ds_gws_init.

           

          Only one wave, the first one, should execute ds_gws_init. All waves must execute the barrier.

          setting m0 to 0x1000 on Hawaii only affects reading gds memory, it does not affect the barrier instructions, as far as I have seen.

           

          I don't think the drivers make a difference but it is possible. I'm using older drivers.

           

          Allan

          1 of 1 people found this helpful
            • Re: Getting a global barrier to work on R290X
              sp314

              Hi Allan,

               

              first of all, thank you for your reply, and thank you and realhet for that earlier thread that I referred to, it was most insightful.

               

              I got barriers to work late last night, and based on what I'm seeing, you're exactly right about everything. My problem was that I was initializing the barrier with the actual wave count, not wave count - 1. Now it works with 44*8=352 waves (barrier initialized to 351), and it works with 2 waves (barrier initializes to 1) as well.

               

              One thing though. In your example in the previous thread, you're initializing the barrier with ds_gws_init, then using it once with ds_gws_barrier, and then reinitializing it again. The new Vega ISA doc (http://developer.amd.com/wordpress/media/2013/12/Vega_Shader_ISA_28July2017.pdf, page 163) has a curious bit about the waves delivering the wave count for the next barrier. So, essentially, we can run just one ds_gws_init, and then repeat ds_gws_barrier-s as needed, without ds_gws_init's in-between. I've tried this, and it also works, at least on 290x. So, do you still think we need to reinitialize the barrier, or we can go with just one ds_gws_init and repeated ds_gws_barriers?

               

              Btw, this is fun, I definitely like this AMD hardware.

                • Re: Getting a global barrier to work on R290X
                  sp314

                  I mean,

                   

                  v_mov_b32      v0, 351

                  ds_gws_init    v0 gds

                  s_waitcnt      lgkmcnt(0)

                  ...

                  some code

                  ...

                  v_mov_b32      v0, 351

                  ds_gws_barrier  v0 gds

                  s_waitcnt      lgkmcnt(0)

                  ...

                  some more code

                  ...

                  v_mov_b32      v0, 351

                  ds_gws_barrier  v0 gds

                  s_waitcnt      lgkmcnt(0)

                   

                  also seems to work for me.

                  • Re: Getting a global barrier to work on R290X
                    drallan

                    sp314,

                    Great. First, thanks for the link to the Vega_Shader_ISA document, that has new and interesting information.

                     

                    "So, do you still think we need to reinitialize the barrier, or we can go with just one ds_gws_init and repeated ds_gws_barriers?"

                     

                    I tried your way, initializing once with waves = 351 and passing the value 351 to each subsequent barrier instruction and that works fine for me too. I'm sure that's how it is meant to work.

                     

                    From the manual's pseudo logic it's clear the count is reloaded from the barrier instruction executed by the wave that finds the resource counter is already <= 0. (That's why the value must be 351 not 352) Whew...

                     

                    GCN is never boring.