4 Replies Latest reply on Mar 9, 2012 3:50 PM by corry

    Possible looping bug in 7970s?

    corry

      I guess first, to confirm, I need to ask is s_getpc_b64 basically something like grabbing the instruction pointer, and s_setpc_b64 setting the instruction pointer?  I can guess the s_sub_b32 and s_subb_b32....it looks like its getting the instruction pointer (pc=program counter?), subtracting from it, and setting it to implement a jump?  If so, what are the unit's associated with the pc? 

       

      The details:  We have a kernel that when running in a loop hangs the GPU.  We have other functions running in loops, so clearly its not something as basic as that.  Upon looking at a simplified function's disassembly, we come to the part about updating the loop counter, all is well.  It compares, then jumps.  When it jumps, it goes to a location where it writes some debug output to ram, then the curious part, the s_getpc_b64 sub, subb, and s_setpc_b64.  Worse, the value being subtracted is 0x16940, which is more than the number of lines in the disassembly.  It would make sense if it were jumping to an offset likely in the FFFFF000 range that it would hang.  I tried 16940/4, but that puts it in the middle of one of the functions called by the loop, and not the first one either.  /2 and it would still go negative.  We're at a loss here for whats going on. 

       

      The new driver causes it to hang all the time.  We're reinstalling it to see if we get the same getpc and setpc stuff going on, and I'm probably going to upgrade my box to 12.2 because I remember having some sort of issue I posed on here about with the preview2 and 3 drivers.....I want to think it caused all my kernels to hang, but I don't remember.

        • Re: Possible looping bug in 7970s?
          MicahVillmow

          corry,

          We have recently fixed a hang issue internally that should show up in one of the upcoming catalyst releases. If you have a test case we can try, then we can confirm if this has fixed the issue or not. If not, then we can work on getting it fixed.

            • Re: Possible looping bug in 7970s?
              corry

              I'm trying to narrow down the problem now...but it takes a reboot every time I fail to find the problem.  All I've done so far is set my loop counter to 1, and set my grid/block sizes to 64,1,1 and 1,1,1.  I did the second part because we found that one of the kernels was generating an excessive number of loads, and I mean excessive.  Rather than look for a big block of v_load_dword or whatrver the actual instruction name is, I figured if I ran a minimal number of threads it might run in a reasonable amount of time if that were the problem.  It should take tenths of a second or less with a loop counter of 1, I let it sit for 2 minutes before rebooting.

               

              I'm going to try to remove the loop and see if it runs.  I'll post up here if I find a simple test case, but I get the feeling this is one of those well, if a and b and !d ^ g+5+the middle initial of your fathers brothers nephews cousins former roommate | 42, then there will be a problem

                • Re: Possible looping bug in 7970s?
                  MicahVillmow

                  corry,

                  can you see if these two instructions follow each other in the ISA?

                  v_readlane_b32 s17, v4, 0

                  s_load_dwordx4  s[28:31], s[2:3], s17

                   

                  Where the result of v_readlane_b32 is the third argument of s_load_dwordx4?

                    • Re: Possible looping bug in 7970s?
                      corry

                      Not exactly, we have the v_readlane followed by the s_load_dword, but its like this...

                       

                      v_readlane_b32 s0, v3, 0

                      v_readlane_b32 s1, v3, 1

                      s_nop 0x0003

                      s_load_dwordx4 s[0:3], s[0:1], 0x38

                       

                      So it seems s0 and s1 are being used in the s_load_dwordx4, but not in the 3rd parameter.

                       

                      Also of note, I stripped my program down on the 12.2 drivers to just writing 8 32 bit 0xFFFFFFFF's to the first 32 bytes of ram.  That alone locks it up. 

                      Disassembly is literally this:

                      v_mov_b32 v0, -1

                      v_mov_b32 v1, -1

                      v_mov_b32 v2, -1

                      v_mov_b32 v3, -1

                      tbuffer_store_format_xyzw v[0:3], v0, s[4:7], 0 format:[BUF_DATA_FORMAT_32_32_32_32,BUF_NUM_FORMAT_FLOAT]

                      tbuffer_store_format_xyzw v[0:3], v0, s[4:7], 0 offset:16 format:[BUF_DATA_FORMAT_32_32_32_32,BUF_NUM_FORMAT_FLOAT]

                       

                      So even if the loop issue was fixed in 12.2, we still can't seem to run literally anything on the 12.2 drivers.  I hope it doesn't try to interpret the values when it stores since for some reason the integers are going out as floats...but I don't think that should hang the card right?