4 Replies Latest reply on Mar 30, 2009 4:07 PM by yangyi0239

    question about timing in brook and cal

    yangyi0239

      Currently I'm working on cal and brook. Sometimes I implemented applications in cal and brook. And I notices that the timing information show the brook is faster than cal. For example, if we run the optimized matmult from brook samples and the compute_matmult from the cal sample. The result is follow:

      result from brook:

      Width  Height      Iterations  CPU Total Time  GPU Total Time          Gflops         Speedup
       1024    1024               1         32.7529        0.127423         15.6957         257.041

       

      result from cal:

      Matrix Size     Kernel Gflops  System Gflops    Kernel Time     System Time
      (1024x1024)     287.212         11.059          0.0075          0.1942

       

      Compared to 0.1942 from cal, the brook 0.127423 is really fast. On the other hand, the kernel gflops of cal is really high. I believe something is wrong with the timing. Can anyone explain it?

       

       

      If I changed the iteration to 16

      brook:

         Width  Height      Iterations  CPU Total Time  GPU Total Time          Gflops         Speedup
          1024    1024              16         536.969        0.556496         57.5026         964.911

       

      cal:

      Matrix Size     Kernel Gflops  System Gflops    Kernel Time     System Time
      (1024x1024)     393.469         11.914          0.0873          2.8841

        • question about timing in brook and cal
          ryta1203

          You are comparing GPU time (Brook+) to system time (CAL)? Are these two the same thing because they don't sound like it?

            • question about timing in brook and cal
              yangyi0239

               

              Originally posted by: ryta1203 You are comparing GPU time (Brook+) to system time (CAL)? Are these two the same thing because they don't sound like it?

               

               

              I think they are the same. They are doing the same thing: transfer cpu->gpu, kernel execution, gpu->cpu.

              The cal is:
                      CopyDataToGPU
                      RunProgram
                      CopyDataFromGPU

              The brook like:

                      StreamRead();

                      kernel();

                      StreamWrite();

                • question about timing in brook and cal
                  ryta1203

                  Isn't there a CAL API call to get the timing of just the GPU? Why would they call one GPU Time and the other System Time?

                    • question about timing in brook and cal
                      yangyi0239

                       

                      Originally posted by: ryta1203 Isn't there a CAL API call to get the timing of just the GPU? Why would they call one GPU Time and the other System Time?

                       

                      If you are looking into the cal code.

                          Info.System.Start();
                          for (CALuint i = 0; i < Info.Iterations; ++i)
                          {
                              CopyDataToGPU(&ctx, resourceHandler, data, numInputs + numConstantBuffers);
                              if (!RunComputeProgram(&ctx, &module, Info.Height / bPartsNum , Info.Height / aPartsNum, &Info))
                              {
                                  return 1;
                              }
                              CopyDataFromGPU(&ctx, &resourceHandler[numInputs + numConstantBuffers], data + numInputs + numConstantBuffers, numOutputs);
                          }
                          Info.System.Stop();

                       

                      RunComputeProgram will invoke follow:

                      Info->Kernel.Start();
                          if( calCtxRunProgramGrid(&event, *ctx, &pg) != CAL_RESULT_OK )
                          {
                              fprintf(stderr, "There was an error running the program.\n");
                              fprintf(stderr, "Error string is %s\n", calGetErrorString());
                              return 0;
                          }
                          // Wait for the last run to complete.
                          while ( calCtxIsEventDone(*ctx, event) == CAL_RESULT_PENDING );
                          Info->Kernel.Stop();

                       

                      All the codes above are from the cal sdk samples.

                       

                      The Kernel is the kernel exection timer, and System is the timer including the memory transfer. In the cal, you can get the kernel execution time. So there are kernel and system, two timer. In the brook, just one timer to show the total time, which should be the same of system time in the cal.