10 Replies Latest reply on Oct 10, 2011 5:35 PM by Raistmer

    Catalyst 11.9 and OpenCL performance

    Raistmer
      One problem resolved, another remains...

      I took opportunity to get early preview of Cat11.9 drivers and downloaded leaked version published on Guru3D.

      My congratulations to AMD OpenCL/driver team - increased CPU usage that inhibited driver update from Cat 11.2 to any later version is fixed. Now CPU usage of 11.9 is the same or even slightly lower than for Cat 11.2 on my test host with HD6950.

      But another issue that I saw with later than 11.2 drivers, namely, greatly and erratically increased elapsed (total or GPU) time, still exists.
      To determine conditions when this happens I tested my app with different workloads (standartized tasks with known parameters) under Cat 11.2 and Cat 11.9 in different conditions: With and w/o app priority increase and with and w/o background CPU usage by idle-priority computionally-intensive applications.

      The short summary - if CPU is busy even with idle-priority tasks GPU application demonstrate very big increase in total running time (elapsed time) and this increase has random character (sometimes it exists, sometimes not, degree of increase can be different from run to run). All this happens under Catalyst 11.9, but not under Catalyst 11.2. Under Catalyst 11.2 elapsed time experiences little increase (by few %) that remains quite stable between runs - it's quite acceptable.
      Unfortunately, performance drop with busy CPU under Catalyst 11.9 inacceptable for high-performance computations that we perform under BOINC platform.

      I hope this early report allow AMD driver/OpenCL team take measures to fix this issue till Cat 11.9 official release.

      Here is test data I got with short comments (cited from another forum):



      I see insrease in elapsed time for some of real-life tasks running on Win7x64 Cat11.9 Guru3D vs Cat 11.2 Vista x86. Some tests were done to investigate the reason: [pre]App Name Task name AR CPU time Elapsed [/pre] Cat 11.2, BOINC suspended, -hp switch [pre] MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0009_v7.wu 0.008955 36.988 127.258 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0395_v7.wu 0.394768 33.462 102.796 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0444_v7.wu 0.444184 33.54 98.979 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG1327_v7.wu 1.326684 25.693 83.948 [/pre] Cat 11.9 Guru3D , BOINC suspended, -hp switch [pre] MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0009_v7.wu 0.008955 34.351 124.02 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0395_v7.wu 0.394768 30.108 83.32 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0444_v7.wu 0.444184 31.091 80.777 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG1327_v7.wu 1.326684 23.26 77.205 [/pre] [color=Yellow]Summary[/color]: Cat 11.9 Guru3D shows better (!) performance (provided OS difference has no influence) ----------------------------------------------------------------------------------------------------------------------------------------------------- Now BOINC runs with CPU tasks, BOINC GPU suspended Cat 11.2, -hp  (2 runs to get random error estimation) MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0009_v7.wu 0.008955 37.425 135.688 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0395_v7.wu 0.394768 34.913 105.933 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0444_v7.wu 0.444184 36.364 102.722 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG1327_v7.wu 1.326684 27.612 88.256 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0009_v7.wu 0.008955 38.673 139.792 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0395_v7.wu 0.394768 34.695 105.814 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0444_v7.wu 0.444184 35.428 101.822 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG1327_v7.wu 1.326684 26.005 86.514 [color=Yellow]Summary[/color]: Loaded CPU increases elapsed and CPU times for GPU app in some degree (expected result, but worth to mention only few % increase in elapsed time) Cat 11.2, w/o -hp switch  (2 runs to get random error estimation) MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0009_v7.wu 0.008955 38.891 139.279 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0395_v7.wu 0.394768 36.114 106.623 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0444_v7.wu 0.444184 35.724 106.205 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG1327_v7.wu 1.326684 28.08 89.055 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0009_v7.wu 0.008955 39.437 140.192 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0395_v7.wu 0.394768 37.565 111.032 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0444_v7.wu 0.444184 36.13 104.521 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG1327_v7.wu 1.326684 27.144 89.224 [color=Yellow]Summary[/color]: with lower priority hindrance from loaded CPU is bigger (but, again, few %) Now the same (BOINC running idle-priority CPU tasks) for Windows 7 x64 + Catalyst 11.9 Guru3D version: -hp switch enabled: App Name Task name AR CPU time Elapsed MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0009_v7.wu 0.008955 34.476 365.609 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0395_v7.wu 0.394768 29.874 444.754 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0444_v7.wu 0.444184 29.687 376.782 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG1327_v7.wu 1.326684 22.402 81.003 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0009_v7.wu 0.008955 35.646 177.591 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0395_v7.wu 0.394768 32.386 166.936 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0444_v7.wu 0.444184 29.531 540.635 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG1327_v7.wu 1.326684 23.743 131.945 w/o -hp switch: App Name Task name AR CPU time Elapsed MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0009_v7.wu 0.008955 36.083 127.934 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0395_v7.wu 0.394768 33.322 87.273 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0444_v7.wu 0.444184 33.384 83.407 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG1327_v7.wu 1.326684 21.637 364.187 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0009_v7.wu 0.008955 35.521 126.087 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0395_v7.wu 0.394768 27.581 434.969 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG0444_v7.wu 0.444184 31.356 82.137 MB7_win_x86_SSE3_OpenCL_ATi_r374 PG1327_v7.wu 1.326684 23.463 80.48 [color=Yellow]Summary[/color]: 1) Cat 11.9 inappropriate to use when CPU busy with processing too - elapsed times can increase greatly and erraticlly. 2) app priority increase can't help with erraticly increased elapsed times when CPU busy. So, while high CPU usage issue is fixed indeed, we still have step back re quite old Catalyst drivers.

        • Catalyst 11.9 and OpenCL performance
          gat3way

          Someone at AMD care to fix download links for Catalyst 11.9 linux drivers? 

            • Catalyst 11.9 and OpenCL performance
              gat3way

              OK it's here finally. I recompiled all my kernels. 100% CPU issue not fixed. A lot of my kernels are also not working correctly now and they used to work with 11.8. I don't know what you've done, but I am finally reverting to SDK2.4 and Catalyst 11.4. And I am not becoming a lab mouse ever again, at least not until SDK 2.6 is released.

               

              Sorry for the rude attitude and ranting, but I really had some expectations for that catalyst release....fix that GPU_USE_SYNC_OBJECTS problem at least. Instead, it broke half of my kernels.  Nice. SDK2.4 and Catalyst 11.4 was the most successful combination and I am going to stick to that until this mess is taken care of properly. 

                • Catalyst 11.9 and OpenCL performance
                  gat3way

                  Good news, upgraded the linux kernel from 2.6.32 to 3.0.0. The broken kernels are now working again with 11.9. That's rather weird but it worked. The 100% CPU problem persists though, with or without GPU_USE_SYNC_OBJECTS.

                    • Catalyst 11.9 and OpenCL performance
                      genaganna

                       

                      Originally posted by: gat3way Good news, upgraded the linux kernel from 2.6.32 to 3.0.0. The broken kernels are now working again with 11.9. That's rather weird but it worked. The 100% CPU problem persists though, with or without GPU_USE_SYNC_OBJECTS.

                      GPU_USE_SYNC_OBJECTS is not fixed yet in Linux. It will be resolved in upcoming releases.

                      Compliling of kernels were the problem or correctness was the problem?

                      Is this problem to both CPU and GPU?

                      Could you please give more your system details(CPU, GPU, OS)?

                       

                        • Catalyst 11.9 and OpenCL performance
                          gat3way

                          Kernels compiled fine (since 2.5/cat11.7, offline compilation occasionaly crashes, but subjectively speaking, with each newer catalyst release, those random crashes occur less and less frequently). This is using the offline devices extension.

                          However something weird happened when I installed 11.9. Some of my kernels stopped producing correct results and I got ASIC hang messages in dmesg (which did not lead to system lockup - unless you see the kernel log, you wouldn't suppose that ever happened). This happened just for some of the kernels, others worked correctly. Note that I did not get any OpenCL error neither transferring data to device and back, nor upon clenqueuendrangekernel(), it just went on running and producing wrong results. Then I upgraded the linux kernel to 3.0.0 and everything is working as it used to before, no more ASIC hang messages. This is rather strange. I was using an older linux kernel though, 2.6.32.

                           

                           

                           

                            • Catalyst 11.9 and OpenCL performance
                              skildude

                              Way to steal a thread.  LEts get this back on topic.  When are we going to get those performance enhancements promised for the 5XXX and 6XXX drivers. Right now the OpenCL drivers are just not cutting it.  I could care less about you making improvements for a single game when the drivers currently are vastly underperforming what should be a blazing fast GPU.  My 6970 is about the same speed as my 5850.  Considering the cost differnce and supposed card improvements this is an absolute travesty.

                                • Catalyst 11.9 and OpenCL performance
                                  genaganna

                                   

                                  Originally posted by: skildude Way to steal a thread.  LEts get this back on topic.  When are we going to get those performance enhancements promised for the 5XXX and 6XXX drivers. Right now the OpenCL drivers are just not cutting it.  I could care less about you making improvements for a single game when the drivers currently are vastly underperforming what should be a blazing fast GPU.  My 6970 is about the same speed as my 5850.  Considering the cost differnce and supposed card improvements this is an absolute travesty.

                                  Could you please give more details of your performance issue(Kernel and runtime details)?

                                  Could please run MonteCarloAsianMultiGPU on both devices as follows and paste results here?

                                   

                                  MultiCarloAsianMultiGPU -c 256 -i 10 -q -t

                                  MonteCarloAsianMultiGPU is shipped with SDK.



                      • Catalyst 11.9 and OpenCL performance
                        Raistmer
                        genaganna, I really do not care about performance that some syntetic sample can show, but I do care when with new catalyst driver my own program start to produce wrong result !!!
                        WTF, indeed. Too much drivers go directly into trash. I just reinstalled to Catalyst 11.10 preview 2. And enjoing of INVALID RESULTS produced. Speed forgotten, now the question is - will it work at all!
                        Maybe time to start debug own drivers before release?
                        • Catalyst 11.9 and OpenCL performance
                          Raistmer
                          And please, don't ask test cases and so on again, benchmark posted. Just order HD6950 at your hardware division, instal that crappy Cat 11.10preview2 and enjoy, link to benchmark posted in another thread....