cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

darkmen
Journeyman III
Journeyman III

OpenCL performance dropped down 12.10 >> 13.1

Hi everyone.

I have updated today the AMD Catalist drivers to 13.1 and got 20% performance loss on my HD7970.

Does anyone have the same experiance?

Also which is the easiest way to rollback to 12,10? Uninstalling 13.1 and reinstalling 12.10 gives the same lower speed (opencl reporting NEW runtime version)

0 Kudos
Reply
20 Replies
Claggy
Adept II
Adept II

Re: OpenCL performance dropped down 12.10 >> 13.1

I reported that last week too:

http://devgurus.amd.com/message/1286437#1286437

I had to delete a whole lot of files to be able to reinstall Cat 12.8,

since then an AMD Catalyst Un-install Utility has appeared on the AMD Game Driver download site:

http://sites.amd.com/us/game/downloads/Pages/catalyst-uninstall-utility.aspx

Not tried it properly yet, except that it didn't work on Vista, and it says it is for Windows 7 only,

Claggy

0 Kudos
Reply
darkmen
Journeyman III
Journeyman III

Re: OpenCL performance dropped down 12.10 >> 13.1

I have uninstall 13.1, deleted syswow64\amdocl.dll and reinstalled 12.10

OCL Runtime version is now 1016 and speed is back.

BR

0 Kudos
Reply
darkhmz
Adept I
Adept I

Re: OpenCL performance dropped down 12.10 >> 13.1

Hi!

I have experienced the same issue with Catalyst 13.1. In my case the performance drop was around 39% on my HD5830. I've tested kernel performance with different versions of amdocl.dll and the OpenCL version shipped with Catalyst 13.1 was the worst. According to APP profiler, kernel execution times were ~17.51ms and ~24.38ms (12.10 vs 13.1).

0 Kudos
Reply
himanshu_gautam
Grandmaster
Grandmaster

Re: OpenCL performance dropped down 12.10 >> 13.1

Hi,

I am sorry to hear this.

If I am not asking for more, Can you please post a simple code that shows the performance degradation.

Thanks,

0 Kudos
Reply
darkmen
Journeyman III
Journeyman III

Re: OpenCL performance dropped down 12.10 >> 13.1

Hi, i have just tried the 13.2 version with OCL runtime 1124.2,

Performance goes even more down then 13.1.

And this is all goes to a compiler. Now comparing ISA sources produced by 12.10 and 13.1 (btw, AMD APP KernelAnalyzer crashes on 13.2)

Seems there are some changes around branches and\or loops.

The source pseudo code:

for(uint i=0;i<STEP;i++){

          if(check_data(...))

     output[0] = i;

}

12.10 ISA:

  s_mov_b64     exec, s[10:11]     

  s_addk_i32    s3, 0x001f         

  s_addk_i32    s2, 0x0001         

  s_cmp_ge_u32  s2, 0x00002100     

  s_cbranch_scc1  label_3CC4       

  s_branch      label_0707         

  s_getpc_b64   s[10:11]           

  s_sub_u32     s10, s10, 0x0000d6e4

  s_subb_u32    s11, s11, 0        

  s_setpc_b64   s[10:11]           

label_3CC4:                        

13.1 ISA:

  s_mov_b64     exec, s[10:11]     

  s_addk_i32    s3, 0x001f         

  s_addk_i32    s2, 0x0001         

  s_cmp_ge_u32  s2, 0x00002100     

  s_cbranch_scc0  label_3F7E       

  s_getpc_b64   s[10:11]           

  s_add_u32     s10, s10, 0x00000038

  s_addc_u32    s11, s11, 0        

  s_setpc_b64   s[10:11]           

label_3F7E:                        

  s_getpc_b64   s[10:11]           

  s_sub_u32     s10, s10, 0x0000d19c

  s_subb_u32    s11, s11, 0        

  s_setpc_b64   s[10:11]           

  s_getpc_b64   s[10:11]           

  s_sub_u32     s10, s10, 0x0000d1b0

  s_subb_u32    s11, s11, 0        

  s_setpc_b64   s[10:11]           

As you can see, the new compiler seems makes more instructions for same code.

0 Kudos
Reply
realhet
Miniboss
Miniboss

Re: OpenCL performance dropped down 12.10 >> 13.1

Wow, that's funny code...

  s_getpc_b64   s[10:11]       

  s_add_u32     s10, s10, 0x00000038

  s_addc_u32    s11, s11, 0       

  s_setpc_b64   s[10:11]          

It can be realized with an "s_branch 0x000E" (0x000E comes from 0x0038/4, /4 because of dword align)

I guess they prepared the compiler to do bigger loops than 128KB (which can't be encoded in s_branch), so they replaced almost every jumps with these 4cycle far jumps. Even when the jump targets are well known absolute locations in s_branch's reach

(Btw: 64KByte is running out of the GCN's 32KByte code cache! You should keep that loop below 32K)

Tho', I think the performance issue could be rather inside the check_data(...) region, not in this rarely executed loop management code.

0 Kudos
Reply
darkmen
Journeyman III
Journeyman III

Re: OpenCL performance dropped down 12.10 >> 13.1

Well, I agree: offcourse this will not give 20% perf loss.

I can see positive experience also (atleast in theory):

  • Loops even more unrolled now
  • exec mask instruntions are more effective (i can see even less branches in code):

12.10 ISA:

  s_mov_b64     s[48:49], exec                             

  s_andn2_b64   exec, s[48:49], s[46:47]                   

  s_andn2_b64   s[44:45], s[44:45], exec                   

  s_cbranch_scc0  label_086E                               

  s_andn2_b64   exec, s[48:49], exec                       

  s_mov_b64     exec, s[48:49]                             

  s_mov_b64     exec, s[44:45]                             

  s_branch      label_0838                                 

label_086E:

13.1 ISA:

  s_mov_b64     vcc, exec                                  

  s_andn2_b64   exec, vcc, s[46:47]                        

  s_andn2_b64   s[44:45], s[44:45], exec                   

  s_cbranch_scc0  label_0C76                               

  s_mov_b64     exec, s[44:45]                             

  s_branch      label_0C42                                 

label_0C76:

So, the question is still open, what makes it slower?

0 Kudos
Reply
himanshu_gautam
Grandmaster
Grandmaster

Re: OpenCL performance dropped down 12.10 >> 13.1

Hi everyone,

From the last few posts, it looks like, there have been some optimizations in the driver 13.1 which have affected a few applications adversely. It will be helpful, if someone can help in pin-pointing this issue. You can point any SDK sample, or a small testcase, which can showcase the performance drop just by using a different driver.

I tried a few SDK Samples: MatrixMulImage, BlackScholes & LDSMemoryBandwidth. But did not see any changes in performance.

0 Kudos
Reply
darkhmz
Adept I
Adept I

Re: OpenCL performance dropped down 12.10 >> 13.1

Hi!

Here is a small testcase that shows quite a big (~33% difference in fps) performance drop on my HD5830 just by using different amdocl.dll versions. I've included the two dlls from 12.10 and 13.1 to make the testing easier, and two pictures to show the obvious performance difference on my card. Hope it helps.

http://www.mediafire.com/?nip722foiqoc4v8