5 Replies Latest reply on Mar 17, 2014 7:37 AM by Bdot

    random calculation error

    Bdot

      Hi,

       

      this part of my code

       

      Code:

       

          if (get_global_id(0)==TRACE_TID) printf((__constant char *)"div2.6: q.d4=%x, carry=%x, nn.d3=%x\n", q.d4, carry, nn.d3);
        q.d4 = q.d4 - nn.d3 + carry;
        if (get_global_id(0)==TRACE_TID) printf((__constant char *)"div2.7: q.d4=%x, carry=%x, nn.d3=%x\n", q.d4, carry, nn.d3);

      prints the following:

       

      Code:

       

      div2.6: q.d4=51ed, carry=0, nn.d3=51ed
       div2.7: q.d4=394d8646, carry=0, nn.d3=51ed

      So, according to Catalyst 13.9 on HD5770,

       

      0x51ed - 0x51ed + 0 = 0x394d8646.

       

      In fact each time I run it, the result is something different. All variables/struct members are of type uint. 13.10 beta does not correct the problem. When I redefine the variables' type to uint2 or uint4, the code works correctly. It also works fine when running it on the CPU device.

       

      Of course, when just putting that into a small test program, it always works. I can provide all source code and/or binaries if needed. But first I wanted to check if that is a known problem?

        • Re: random calculation error
          himanshu.gautam

          printf() has been quite quirky...even yesterday there was an issue reported on printing 64-bit numbers using %lu

          I have asked for source code for repro..

          Please provide a repro case and I will file a bug tomorrow.

           

          -

          Bruhaspati

            • Re: random calculation error
              Bdot

              Oh, maybe I was not really clear in my description ... printf is perfectly OK and prints the correct values. It is the actual calculation that is wrong.

               

              Where can I send my program + sources and explanation? (as mentioned, it is not reproducible in a simple test case)

                • Re: random calculation error
                  himanshu.gautam

                  okay... It could actually be either way...

                  Use the advanced editor (see top right on the text box -- it looks like it is disabled and faded. but it is not).

                  It allows you to add attachments.

                  Please give me a small repro case so that I can take up the case with Engg team.

                  +

                  Please tell me what platform, what driver, bitness of your platform, APP SDK version used, CPU and GPU used...etc..

                  Thank you,

                    • Re: Re: random calculation error
                      Bdot

                      Trying to simplify to OpenCL-code always failed to show the issue. I hope this one is still useful. On my HD5770 / Cat13.12 / Win7/64 / APPSDK 2.8.1214.3 it shows this:

                      To run the selftest call "mfakto -st". It will start like this:

                      mfakto 0.14pre3-Win (64bit build)

                      Runtime options

                      ...

                      ######### testcase 1/1559 (M50804297[67-68]) #########

                      Starting trial factoring M50804297 from 2^67 to 2^68 (0.59GHz-days)

                      Using GPU kernel "cl_barrett32_87"

                      2097152 FCs copied in 0.00 ms (1.#J MB/s), proc'd in 18.05 ms (116.22 M/s)

                      2097152 FCs copied in 0.00 ms (1.#J MB/s), proc'd in 18.04 ms (116.27 M/s)

                      2097152 FCs copied in 0.00 ms (1.#J MB/s), proc'd in 18.04 ms (116.27 M/s)

                      Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait

                      Mar 12 23:36 | 3387   0.1% |  0.500    n.a. |      n.a.    25000    n.a.%

                      no factor for M50804297 from 2^67 to 2^68 [mfakto 0.14pre3-Win cl_barrett32_87_1]

                      ERROR: selftest failed for M50804297 (cl_barrett32_87)

                        no factor found

                      tf(): total time spent:  0.515s

                       

                      Press Ctrl-C to interrupt the test. The ERROR line shows a calculation error. To locate the error, I added tracing. To enable, modify mfakto_Kernels.cl, line45: #define TRACE_KERNEL (higher numbers mean more trace - 2 shows the issue), then rerun "mfakto -st":

                      ...

                      ######### testcase 1/1559 (M50804297[67-68]) #########

                      Starting trial factoring M50804297 from 2^67 to 2^68 (0.59GHz-days)

                      Using GPU kernel "cl_barrett32_87"

                      cl_barrett32_87: tid=0, f=9:ca9c44cb:6c467957, shift=19

                      div2.6: q=0:1380:383666f:dfba1994:d1000000, nn=1380:c21d0313:6dab666b:2f000000, tmp=383666f, c=0

                      div2.6: q.d4=1380, carry=0, nn.d3=1380, q.d4 - nn.d3 + carry=0

                      div2: q=0:9f39e252:383666f:dfba1994:d1000000(c=0), nn=1380:c21d0313:6dab666b:2f000000, res=d12981fd:e9000000:0

                      div3.1: q=0:9f39e252:383666f:dfba1994:d1000000:0, n=9:ca9c44cb:6c467957, qi=ffffffff, nf=2.3779E-011

                      cl_barrett32_87: f=9:ca9c44cb:6c467957, u=d1298206:e9007ff8:ffff7fff, ff=2.3779E-011

                      ...

                      (press Ctrl-C twice to abort - somehow the program is alway hanging when using printf in a kernel)

                       

                      The source code for this part is in barrett.cl, around line 421:

                      #if (TRACE_KERNEL > 1)

                          if (get_global_id(0)==TRACE_TID) printf((__constant char *)"div2.6: q=%x:%x:%x:%x:%x, nn=%x:%x:%x:%x, tmp=%x, c=%x\n",

                              q.d5, q.d4, q.d3, q.d2, q.d1, nn.d3, nn.d2, nn.d1, nn.d0, tmp, carry);

                          if (get_global_id(0)==TRACE_TID) printf((__constant char *)"div2.6: q.d4=%x, carry=%x, nn.d3=%x, q.d4 - nn.d3 + carry=%x\n",

                              q.d4, carry, nn.d3, q.d4 - nn.d3 + carry);

                      #endif

                       

                        res->d0 = q.d4 - nn.d3 + carry; // attempt to trick AMD compiler to correctly save the result of the subtraction

                        q.d4 = q.d4 -nn.d3 + carry;  // <<<===  this one saves a random result into q.d4

                       

                      #if (TRACE_KERNEL > 1)

                          if (get_global_id(0)==TRACE_TID) printf((__constant char *)"div2: q=%x:%x:%x:%x:%x(c=%x), nn=%x:%x:%x:%x, res=%x:%x:%x\n",

                              q.d5, q.d4, q.d3, q.d2, q.d1, carry, nn.d3, nn.d2, nn.d1, nn.d0, res->d2, res->d1, res->d0);

                      #endif

                      When calculating "q.d4 - nn.d3 + carry"  (which is 0x1380 - 0x1380 + 0) inside printf, "0" is correctly printed. But when saving it to q.d4 (which is used later on), it saves a rather random value (0x9f39e252). This leads to the calculation error.

                       

                      When running on the CPU device ("mfakto -d cpu -st"), all tests finish successful.

                      When changing the data type of each of the components from a scalar uint to an uint vector of any size, all tests finish successful (*).

                      When increasing the trace level to 3 or higher, this calculation step yields the correct result (but it hangs the program because using printf inside the kernel ...).

                       

                      I'm searching for hints how to get to correct results on the GPU with scalars.

                      I'd like AMD to research on this issue - is it a compiler bug or an error in my code?

                      I'd like to know why printf hangs up my program.

                       

                      Thanks a lot for any help

                       

                      (*) to do this, disable tracing (#define TRACE_KERNEL 0), edit mfakto.ini, line 107, set VectorSize=2 (or any other vector size)