cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

yangyi0239
Journeyman III

question about timing in brook and cal

Currently I'm working on cal and brook. Sometimes I implemented applications in cal and brook. And I notices that the timing information show the brook is faster than cal. For example, if we run the optimized matmult from brook samples and the compute_matmult from the cal sample. The result is follow:

result from brook:

Width  Height      Iterations  CPU Total Time  GPU Total Time          Gflops         Speedup
 1024    1024               1         32.7529        0.127423         15.6957         257.041

 

result from cal:

Matrix Size     Kernel Gflops  System Gflops    Kernel Time     System Time
(1024x1024)     287.212         11.059          0.0075          0.1942

 

Compared to 0.1942 from cal, the brook 0.127423 is really fast. On the other hand, the kernel gflops of cal is really high. I believe something is wrong with the timing. Can anyone explain it?

 

 

If I changed the iteration to 16

brook:

   Width  Height      Iterations  CPU Total Time  GPU Total Time          Gflops         Speedup
    1024    1024              16         536.969        0.556496         57.5026         964.911

 

cal:

Matrix Size     Kernel Gflops  System Gflops    Kernel Time     System Time
(1024x1024)     393.469         11.914          0.0873          2.8841

0 Likes
4 Replies
ryta1203
Journeyman III

You are comparing GPU time (Brook+) to system time (CAL)? Are these two the same thing because they don't sound like it?

0 Likes

Originally posted by: ryta1203 You are comparing GPU time (Brook+) to system time (CAL)? Are these two the same thing because they don't sound like it?

 

 

I think they are the same. They are doing the same thing: transfer cpu->gpu, kernel execution, gpu->cpu.

The cal is:
        CopyDataToGPU
        RunProgram
        CopyDataFromGPU

The brook like:

        StreamRead();

        kernel();

        StreamWrite();

0 Likes

Isn't there a CAL API call to get the timing of just the GPU? Why would they call one GPU Time and the other System Time?

0 Likes

Originally posted by: ryta1203 Isn't there a CAL API call to get the timing of just the GPU? Why would they call one GPU Time and the other System Time?

 

If you are looking into the cal code.

    Info.System.Start();
    for (CALuint i = 0; i < Info.Iterations; ++i)
    {
        CopyDataToGPU(&ctx, resourceHandler, data, numInputs + numConstantBuffers);
        if (!RunComputeProgram(&ctx, &module, Info.Height / bPartsNum , Info.Height / aPartsNum, &Info))
        {
            return 1;
        }
        CopyDataFromGPU(&ctx, &resourceHandler[numInputs + numConstantBuffers], data + numInputs + numConstantBuffers, numOutputs);
    }
    Info.System.Stop();

 

RunComputeProgram will invoke follow:

Info->Kernel.Start();
    if( calCtxRunProgramGrid(&event, *ctx, &pg) != CAL_RESULT_OK )
    {
        fprintf(stderr, "There was an error running the program.\n");
        fprintf(stderr, "Error string is %s\n", calGetErrorString());
        return 0;
    }
    // Wait for the last run to complete.
    while ( calCtxIsEventDone(*ctx, event) == CAL_RESULT_PENDING );
    Info->Kernel.Stop();

 

All the codes above are from the cal sdk samples.

 

The Kernel is the kernel exection timer, and System is the timer including the memory transfer. In the cal, you can get the kernel execution time. So there are kernel and system, two timer. In the brook, just one timer to show the total time, which should be the same of system time in the cal.

 

0 Likes