AnsweredAssumed Answered

"Strange" completion time of Saxpy and Reduce for 64MB vectors

Question asked by cadorino on May 18, 2012
Latest reply on May 19, 2012 by settle

Hi to everybody!

I'm working on some simple OpenCL benchmarks to test heterogeneous computing on a APU + GPU system. The tests are actually a vector addition and a reduce, executed many times by varying the amount of data (vector sizes). The results produced by the algorithms have been deeply validated, therefore I expect the OpenCL kernels to be correct.

The completion time of the execution with a fixed amount of data is obtained by averaging 10000-100000 samples.

 

While running the tests I noticed a "strange" behaviour of the completion time in both the algorithms and on both the integrated and the discrete GPU.

For data sizes ranging from 64KB - 32MB  (per vector, 2 vectors in saxpy, 1 in reduce) the completion time is approximatively proportional to the data size. For example, the completion time of summing 32MB vectors is about two times the completion time of summing 16MB vectors.

Instead, when the algorithms are run with 64MB vectors the completion time is much more higher, 4 or 5 times the completion time with 32MB vectors.

 

I've never expected the completion time on GPU to vary linearly with the amount of data, but I can't understand why a big jump happens when moving from 32MB data to 64MB data.
Any idea?

 

Thank you very much

 

P.S. I briefly report the completion times of Saxpy and Reduce on both the discrete and the integrated GPUs, for 8-64MB vectors and with different buffer allocation strategies

 

--------- Reduce ---------

 

- Testing OpenCL

  - Testing devices (Cypress - discrete GPU)

    - Testing memory modes (0, 0)

      - Testing with 4194304 bytes...

7.119792 ms, (61 counters)

      - Testing with 8388608 bytes...

11.758621 ms, (61 counters)

      - Testing with 16777216 bytes...

20.622047 ms, (61 counters)

      - Testing with 33554432 bytes...

39.022222 ms, (61 counters)

      - Testing with 67108864 bytes...

80.660377 ms, (61 counters)

    - Testing memory modes (1, 1)

      - Testing with 4194304 bytes...

4.352941 ms, (61 counters)

      - Testing with 8388608 bytes...

6.902174 ms, (61 counters)

      - Testing with 16777216 bytes...

10.953642 ms, (61 counters)

      - Testing with 33554432 bytes...

19.405405 ms, (61 counters)

      - Testing with 67108864 bytes...

71.408163 ms, (61 counters)

    - Testing memory modes (2, 2)

      - Testing with 4194304 bytes...

3.559242 ms, (61 counters)

      - Testing with 8388608 bytes...

5.107692 ms, (61 counters)

      - Testing with 16777216 bytes...

8.264151 ms, (61 counters)

      - Testing with 33554432 bytes...

14.475410 ms, (61 counters)

      - Testing with 67108864 bytes...

65.489796 ms, (61 counters)

    - Testing memory modes (1, 2)

      - Testing with 4194304 bytes...

4.182692 ms, (61 counters)

      - Testing with 8388608 bytes...

6.416667 ms, (61 counters)

      - Testing with 16777216 bytes...

10.666667 ms, (61 counters)

      - Testing with 33554432 bytes...

19.285714 ms, (61 counters)

      - Testing with 67108864 bytes...

65.127660 ms, (61 counters)

 

  - Testing devices (Beavercreek - integrated GPU)

    - Testing memory modes (0, 0)

      - Testing with 4194304 bytes...

8.260870 ms, (61 counters)

      - Testing with 8388608 bytes...

11.795181 ms, (61 counters)

      - Testing with 16777216 bytes...

20.511628 ms, (61 counters)

      - Testing with 33554432 bytes...

37.989011 ms, (61 counters)

      - Testing with 67108864 bytes...

78.018519 ms, (61 counters)

    - Testing memory modes (1, 1)

      - Testing with 4194304 bytes...

3.918269 ms, (61 counters)

      - Testing with 8388608 bytes...

5.228723 ms, (61 counters)

      - Testing with 16777216 bytes...

7.319527 ms, (61 counters)

      - Testing with 33554432 bytes...

11.503597 ms, (61 counters)

      - Testing with 67108864 bytes...

49.800000 ms, (61 counters)

    - Testing memory modes (2, 2)

      - Testing with 4194304 bytes...

3.154930 ms, (61 counters)

      - Testing with 8388608 bytes...

4.273171 ms, (61 counters)

      - Testing with 16777216 bytes...

6.450292 ms, (61 counters)

      - Testing with 33554432 bytes...

10.588235 ms, (61 counters)

      - Testing with 67108864 bytes...

58.763636 ms, (61 counters)

    - Testing memory modes (1, 2)

      - Testing with 4194304 bytes...

3.732394 ms, (61 counters)

      - Testing with 8388608 bytes...

4.875000 ms, (61 counters)

      - Testing with 16777216 bytes...

7.081871 ms, (61 counters)

      - Testing with 33554432 bytes...

11.239437 ms, (61 counters)

      - Testing with 67108864 bytes...

48.228070 ms, (61 counters)

 

-------- Saxpy ---------

 

  - Testing devices (Cypress - discrete GPU)

    - Testing memory modes (0, 0)

      - Testing with 4194304 bytes...

15.266254 ms, (61 counters)

      - Testing with 8388608 bytes...

28.105556 ms, (61 counters)

      - Testing with 16777216 bytes...

53.105263 ms, (61 counters)

      - Testing with 33554432 bytes...

104.645833 ms, (61 counters)

      - Testing with 67108864 bytes...

212.041667 ms, (61 counters)

    - Testing memory modes (1, 1)

      - Testing with 4194304 bytes...

11.068396 ms, (61 counters)

      - Testing with 8388608 bytes...

20.729730 ms, (61 counters)

      - Testing with 16777216 bytes...

39.848739 ms, (61 counters)

      - Testing with 33554432 bytes...

81.152542 ms, (61 counters)

      - Testing with 67108864 bytes...

212.363636 ms, (61 counters)

    - Testing memory modes (1, 2)

      - Testing with 4194304 bytes...

8.676174 ms, (61 counters)

      - Testing with 8388608 bytes...

16.105802 ms, (61 counters)

      - Testing with 16777216 bytes...

31.021898 ms, (61 counters)

      - Testing with 33554432 bytes...

64.149254 ms, (61 counters)

      - Testing with 67108864 bytes...

181.208333 ms, (61 counters)

 

  - Testing devices (Beavercreek - integrated GPU)

    - Testing memory modes (0, 0)

      - Testing with 4194304 bytes...

16.267606 ms, (61 counters)

      - Testing with 8388608 bytes...

27.192308 ms, (61 counters)

      - Testing with 16777216 bytes...

51.655914 ms, (61 counters)

      - Testing with 33554432 bytes...

100.480000 ms, (61 counters)

      - Testing with 67108864 bytes...

206.520000 ms, (61 counters)

    - Testing memory modes (1, 1)

      - Testing with 4194304 bytes...

9.558000 ms, (61 counters)

      - Testing with 8388608 bytes...

17.188679 ms, (61 counters)

      - Testing with 16777216 bytes...

32.297872 ms, (61 counters)

      - Testing with 33554432 bytes...

62.369863 ms, (61 counters)

      - Testing with 67108864 bytes...

185.500000 ms, (61 counters)

    - Testing memory modes (1, 2)

      - Testing with 4194304 bytes...

6.277571 ms, (61 counters)

      - Testing with 8388608 bytes...

11.224057 ms, (61 counters)

      - Testing with 16777216 bytes...

21.339713 ms, (61 counters)

      - Testing with 33554432 bytes...

41.730769 ms, (61 counters)

      - Testing with 67108864 bytes...

148.032258 ms, (61 counters)

Outcomes