bjang

GFLOPS calculation and relationship with input size

Discussion created by bjang on Aug 20, 2008
Latest reply on Aug 21, 2008 by eduardoschardong

I have two questions.

1. I am wondering why SDK sample includes streamRead and streamWrite time when calculating GLOPs? I admit it is one of GPU performance figure but it is nothing to do with GPU computing power. I believe many people don't include data transfer time when it comes to GFLOPs (including many research papers with CUDA).

2. Considering a simple_matmult example in Brook SDK as an example, its GFLOPs increases as input size increases until some points and stay almost flat (saturation). I am wondering why it is so. Below is my data with 3870 X2. As you can see, GFLOPs is being saturated at around 2048*2048. Any thoughts are more than welcome. I don't think it is due to lack of threads.

(input_matrix_size)   (GFLOPs)

128*128    0.13293
256*256    0.8292
384*384    1.90313
512*512    2.73721
640*640    3.24699
768*768    4.20032
896*896    5.58104
1024*1024    6.62299
1152*1152    7.3679
1280*1280    7.85343
1408*1408    8.17544
1536*1536    7.50756
1664*1664    8.01407
1792*1792    8.36521
1920*1920    8.77919
2048*2048    8.94158
2176*2176    8.93896
2304*2304    8.80759
2432*2432    9.00353
2560*2560    9.18522
2688*2688    9.11353
2816*2816    9.23887
2944*2944    9.18581
3072*3072    9.09957
3200*3200    9.20422
3328*3328    9.20185
3456*3456    9.30503
3584*3584    9.29831
3712*3712    9.28772

Outcomes