dinaharchery

Compute GFlops for Matrix-Vector GPU

Discussion created by dinaharchery on Dec 21, 2009
Latest reply on Dec 23, 2009 by dinaharchery
Compare CPU-GPU GFLops

Hello All,

I am studying the performance of GPU against CPU with regards to Matrix-Matrix/Vector multiplication (no compression format) and am getting some LARGE GFlops for the GPU. I must be computing the GFLops badly because I don't believe I should be getting upwards of 2470 GFlops for a simple Matrix-Vector multiplication ?

I am using a GPU with the following hardware design:

Graphics Card Manufacturer Powered by ATI 
Graphics Chipset  ATI MOBILITY RADEON HD 4530 / 4570 
Device ID   9553 
Vendor    1002  
Subsystem ID   02BE 
Subsystem Vendor ID  1028  
Graphics Bus Capability  PCI Express 2.0 
Maximum Bus Setting  PCI Express 2.0 x16  
BIOS Version   011.021.000.007 
BIOS Part Number  BR32787-001 
BIOS Date   2009/04/17 
Memory Size   2045 MB 
Memory Type   HyperMemory  
Core Clock in MHz  680 MHz 
Memory Clock in MHz  800 MHz
Number of Cores:  80 Unified

 

The code I am using to compute the GFlops is attached, can anyone tell me what I am doing wrong?

Setup(0);
// Start GPU Timer:
Start(0);
// Kernel Call - Matrix-Vector Multiplication:
simpleMatmult(m, S_m1, S_m2, S_realresult);
// Stop GPU Timer:
Stop(0);
gpuTime = GetElapsedTime(0);

double gflop = (double)(2.0*n*m*m)/(double)(1024 * 1024 * 1024);

printf("Total GFlops = %f\n", gflop/gpuTime);

Setup(0); // Start GPU Timer: Start(0); // Kernel Call - Matrix-Vector Multiplication: simpleMatmult(m, S_m1, S_m2, S_realresult); // Stop GPU Timer: Stop(0); gpuTime = GetElapsedTime(0); double gflop = (double)(2.0*n*m*m)/(double)(1024 * 1024 * 1024); printf("Total GFlops = %f\n", gflop/gpuTime);

Outcomes