# Compute GFlops for Matrix-Vector GPU

Discussion created by dinaharchery on Dec 21, 2009
Latest reply on Dec 23, 2009 by dinaharchery
Compare CPU-GPU GFLops

Hello All,

I am studying the performance of GPU against CPU with regards to Matrix-Matrix/Vector multiplication (no compression format) and am getting some LARGE GFlops for the GPU. I must be computing the GFLops badly because I don't believe I should be getting upwards of 2470 GFlops for a simple Matrix-Vector multiplication ?

I am using a GPU with the following hardware design:

Graphics Chipset  ATI MOBILITY RADEON HD 4530 / 4570
Device ID   9553
Vendor    1002
Subsystem ID   02BE
Subsystem Vendor ID  1028
Graphics Bus Capability  PCI Express 2.0
Maximum Bus Setting  PCI Express 2.0 x16
BIOS Version   011.021.000.007
BIOS Part Number  BR32787-001
BIOS Date   2009/04/17
Memory Size   2045 MB
Memory Type   HyperMemory
Core Clock in MHz  680 MHz
Memory Clock in MHz  800 MHz
Number of Cores:  80 Unified

The code I am using to compute the GFlops is attached, can anyone tell me what I am doing wrong?

Setup(0);
// Start GPU Timer:
Start(0);
// Kernel Call - Matrix-Vector Multiplication:
simpleMatmult(m, S_m1, S_m2, S_realresult);
// Stop GPU Timer:
Stop(0);
gpuTime = GetElapsedTime(0);

double gflop = (double)(2.0*n*m*m)/(double)(1024 * 1024 * 1024);

printf("Total GFlops = %f\n", gflop/gpuTime);

`Setup(0); // Start GPU Timer: Start(0); // Kernel Call - Matrix-Vector Multiplication: simpleMatmult(m, S_m1, S_m2, S_realresult); // Stop GPU Timer: Stop(0); gpuTime = GetElapsedTime(0); double gflop = (double)(2.0*n*m*m)/(double)(1024 * 1024 * 1024); printf("Total GFlops = %f\n", gflop/gpuTime);`