Currently I'm working on cal and brook. Sometimes I implemented applications in cal and brook. And I notices that the timing information show the brook is faster than cal. For example, if we run the optimized matmult from brook samples and the compute_matmult from the cal sample. The result is follow:
result from brook:
Width Height Iterations CPU Total Time GPU Total Time Gflops Speedup
1024 1024 1 32.7529 0.127423 15.6957 257.041
result from cal:
Matrix Size Kernel Gflops System Gflops Kernel Time System Time
(1024x1024) 287.212 11.059 0.0075 0.1942
Compared to 0.1942 from cal, the brook 0.127423 is really fast. On the other hand, the kernel gflops of cal is really high. I believe something is wrong with the timing. Can anyone explain it?
If I changed the iteration to 16
brook:
Width Height Iterations CPU Total Time GPU Total Time Gflops Speedup
1024 1024 16 536.969 0.556496 57.5026 964.911
cal:
Matrix Size Kernel Gflops System Gflops Kernel Time System Time
(1024x1024) 393.469 11.914 0.0873 2.8841
You are comparing GPU time (Brook+) to system time (CAL)? Are these two the same thing because they don't sound like it?
Originally posted by: ryta1203 You are comparing GPU time (Brook+) to system time (CAL)? Are these two the same thing because they don't sound like it?
I think they are the same. They are doing the same thing: transfer cpu->gpu, kernel execution, gpu->cpu.
The cal is:
CopyDataToGPU
RunProgram
CopyDataFromGPU
The brook like:
StreamRead();
kernel();
StreamWrite();
Isn't there a CAL API call to get the timing of just the GPU? Why would they call one GPU Time and the other System Time?
Originally posted by: ryta1203 Isn't there a CAL API call to get the timing of just the GPU? Why would they call one GPU Time and the other System Time?
If you are looking into the cal code.
Info.System.Start();
for (CALuint i = 0; i < Info.Iterations; ++i)
{
CopyDataToGPU(&ctx, resourceHandler, data, numInputs + numConstantBuffers);
if (!RunComputeProgram(&ctx, &module, Info.Height / bPartsNum , Info.Height / aPartsNum, &Info))
{
return 1;
}
CopyDataFromGPU(&ctx, &resourceHandler[numInputs + numConstantBuffers], data + numInputs + numConstantBuffers, numOutputs);
}
Info.System.Stop();
RunComputeProgram will invoke follow:
Info->Kernel.Start();
if( calCtxRunProgramGrid(&event, *ctx, &pg) != CAL_RESULT_OK )
{
fprintf(stderr, "There was an error running the program.\n");
fprintf(stderr, "Error string is %s\n", calGetErrorString());
return 0;
}
// Wait for the last run to complete.
while ( calCtxIsEventDone(*ctx, event) == CAL_RESULT_PENDING );
Info->Kernel.Stop();
All the codes above are from the cal sdk samples.
The Kernel is the kernel exection timer, and System is the timer including the memory transfer. In the cal, you can get the kernel execution time. So there are kernel and system, two timer. In the brook, just one timer to show the total time, which should be the same of system time in the cal.