Raistmer

Catalyst 11.9 and OpenCL performance

Discussion created by Raistmer on Sep 25, 2011
Latest reply on Oct 10, 2011 by Raistmer
One problem resolved, another remains...

I took opportunity to get early preview of Cat11.9 drivers and downloaded leaked version published on Guru3D.

My congratulations to AMD OpenCL/driver team - increased CPU usage that inhibited driver update from Cat 11.2 to any later version is fixed. Now CPU usage of 11.9 is the same or even slightly lower than for Cat 11.2 on my test host with HD6950.

But another issue that I saw with later than 11.2 drivers, namely, greatly and erratically increased elapsed (total or GPU) time, still exists.
To determine conditions when this happens I tested my app with different workloads (standartized tasks with known parameters) under Cat 11.2 and Cat 11.9 in different conditions: With and w/o app priority increase and with and w/o background CPU usage by idle-priority computionally-intensive applications.

The short summary - if CPU is busy even with idle-priority tasks GPU application demonstrate very big increase in total running time (elapsed time) and this increase has random character (sometimes it exists, sometimes not, degree of increase can be different from run to run). All this happens under Catalyst 11.9, but not under Catalyst 11.2. Under Catalyst 11.2 elapsed time experiences little increase (by few %) that remains quite stable between runs - it's quite acceptable.
Unfortunately, performance drop with busy CPU under Catalyst 11.9 inacceptable for high-performance computations that we perform under BOINC platform.

I hope this early report allow AMD driver/OpenCL team take measures to fix this issue till Cat 11.9 official release.

Here is test data I got with short comments (cited from another forum):



I see insrease in elapsed time for some of real-life tasks running on Win7x64 Cat11.9 Guru3D vs Cat 11.2 Vista x86. Some tests were done to investigate the reason: [pre]App Name Task name AR CPU time Elapsed [/pre] Cat 11.2, BOINC suspended, -hp switch [pre] MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0009_v7.wu 0.008955 36.988 127.258 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0395_v7.wu 0.394768 33.462 102.796 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0444_v7.wu 0.444184 33.54 98.979 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG1327_v7.wu 1.326684 25.693 83.948 [/pre] Cat 11.9 Guru3D , BOINC suspended, -hp switch [pre] MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0009_v7.wu 0.008955 34.351 124.02 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0395_v7.wu 0.394768 30.108 83.32 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0444_v7.wu 0.444184 31.091 80.777 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG1327_v7.wu 1.326684 23.26 77.205 [/pre] [color=Yellow]Summary[/color]: Cat 11.9 Guru3D shows better (!) performance (provided OS difference has no influence) ----------------------------------------------------------------------------------------------------------------------------------------------------- Now BOINC runs with CPU tasks, BOINC GPU suspended Cat 11.2, -hp  (2 runs to get random error estimation) MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0009_v7.wu 0.008955 37.425 135.688 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0395_v7.wu 0.394768 34.913 105.933 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0444_v7.wu 0.444184 36.364 102.722 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG1327_v7.wu 1.326684 27.612 88.256 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0009_v7.wu 0.008955 38.673 139.792 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0395_v7.wu 0.394768 34.695 105.814 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0444_v7.wu 0.444184 35.428 101.822 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG1327_v7.wu 1.326684 26.005 86.514 [color=Yellow]Summary[/color]: Loaded CPU increases elapsed and CPU times for GPU app in some degree (expected result, but worth to mention only few % increase in elapsed time) Cat 11.2, w/o -hp switch  (2 runs to get random error estimation) MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0009_v7.wu 0.008955 38.891 139.279 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0395_v7.wu 0.394768 36.114 106.623 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0444_v7.wu 0.444184 35.724 106.205 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG1327_v7.wu 1.326684 28.08 89.055 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0009_v7.wu 0.008955 39.437 140.192 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0395_v7.wu 0.394768 37.565 111.032 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0444_v7.wu 0.444184 36.13 104.521 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG1327_v7.wu 1.326684 27.144 89.224 [color=Yellow]Summary[/color]: with lower priority hindrance from loaded CPU is bigger (but, again, few %) Now the same (BOINC running idle-priority CPU tasks) for Windows 7 x64 + Catalyst 11.9 Guru3D version: -hp switch enabled: App Name Task name AR CPU time Elapsed MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0009_v7.wu 0.008955 34.476 365.609 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0395_v7.wu 0.394768 29.874 444.754 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0444_v7.wu 0.444184 29.687 376.782 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG1327_v7.wu 1.326684 22.402 81.003 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0009_v7.wu 0.008955 35.646 177.591 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0395_v7.wu 0.394768 32.386 166.936 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0444_v7.wu 0.444184 29.531 540.635 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG1327_v7.wu 1.326684 23.743 131.945 w/o -hp switch: App Name Task name AR CPU time Elapsed MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0009_v7.wu 0.008955 36.083 127.934 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0395_v7.wu 0.394768 33.322 87.273 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0444_v7.wu 0.444184 33.384 83.407 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG1327_v7.wu 1.326684 21.637 364.187 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0009_v7.wu 0.008955 35.521 126.087 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0395_v7.wu 0.394768 27.581 434.969 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0444_v7.wu 0.444184 31.356 82.137 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG1327_v7.wu 1.326684 23.463 80.48 [color=Yellow]Summary[/color]: 1) Cat 11.9 inappropriate to use when CPU busy with processing too - elapsed times can increase greatly and erraticlly. 2) app priority increase can't help with erraticly increased elapsed times when CPU busy. So, while high CPU usage issue is fixed indeed, we still have step back re quite old Catalyst drivers.

Outcomes