25 Replies Latest reply on Oct 28, 2010 3:44 PM by MicahVillmow

    6870's wavefronts

    bubu

      Now that the NDA expired...

      Have you changed the wavefront from 64 to 80 for the 6870? Everybody is guessing.

        • 6870's wavefronts
          empty_knapsack

          From first reviews 6850/6870 looks like exactly the same as 58XX. I'm even curious is target id changed at all for 6XXX family?.. And even if it changed, will binary code differs simply in one byte as it's the case with Cypress/Juniper (of course without using DPFP)?

            • 6870's wavefronts
              nou

              rumors on the internet says cayman will be much more different. barts are just tuned cypress architeture.

                • 6870's wavefronts
                  eklund.n

                  isn't 6870 just 14 compute units with 80 vliw ALU:s?

                  if they have changed the number of ALU:s per stream core to 4 (from 5), then the wavefront ought to be 80. but probably they still have 5 ALU:s per stream core, i.e. 16 stream cores per CU and wavefront size 64.

                    • 6870's wavefronts
                      himanshu.gautam

                      eklund.n,

                      The size of wavefront only depends on the number of stream cores present in a Compute Unit.It doesn't matter how many processing elements you have inside a stream core.So wavefront size will remain 64 unless we have changes in the number of stream core in a compute unit. 

                        • 6870's wavefronts
                          eklund.n

                          exactly. but I didn't know how many stream cores that are in the new architecture. so dividing what i did know, the number of ALU:s, with number per stream cores indicates the wavefront size. as i was trying to explain, the only way to get 80 in wavefrontsize is if the stream core only have 4 ALU:s. 

                  • 6870's wavefronts
                    MicahVillmow
                    The wavefront size on the new graphics card did not change. It is still 64.
                    • 6870's wavefronts
                      MicahVillmow
                      Yes, it is still 16 TPs/SIMD. The new HD58XX's should provide more power to your applications without requiring any changes to your program.
                      • 6870's wavefronts
                        MicahVillmow
                        Sorry, typo on my part, It should be HD68XX. The HD68XX has 12/14 SIMD's and has improvements which lower the cost of thread scheduling. This means that flow control clauses don't don't require as many cycles.
                          • 6870's wavefronts
                            gat3way

                            As far as opencl is concerned, do we have the same memory limits as 5xxx? E.g LDS, __constant, etc.


                            What about double precision?

                            • 6870's wavefronts
                              nou

                               

                              Originally posted by: MicahVillmow Sorry, typo on my part, It should be HD68XX. The HD68XX has 12/14 SIMD's and has improvements which lower the cost of thread scheduling. This means that flow control clauses don't don't require as many cycles.


                              great. can we expect more GPGPU optimization on Cayman than Barts?

                                • 6870's wavefronts
                                  empty_knapsack

                                  It turns out that it's possible to compile IL code to new 6XXX ISA at least from Catalyst 10.6. New targets were added to calclCompile() functions from 12 to 19. While 12-14 and 17-19 producing code exactly the same as for Cypress/Juniper (only header differs in 1-4 bytes) and probably one of these matching the Bart's ISA, 15 and 16 is totally different story. For example, some code compiled for 5XXX starts as:

                                  2 z: ADD_INT ____, R2.y, R0.w
                                  t: MULLO_UINT T0.y, R1.z, R3.x
                                  3 z: MOV R0.z, KC0[0].z
                                  w: ADD_INT T1.w, R0.x, PV2.z
                                  t: MOV R0.w, KC0[0].w
                                  4 t: MULLO_UINT T0.w, T0.y, R3.y
                                  5 t: MULLO_UINT ____, R1.y, R3.x
                                  6 y: ADD_INT ____, T0.w, PS5
                                  7 w: ADD_INT ____, R1.x, PV6.y
                                  8 z: LSHL ____, PV7.w, (0x00000006, 8.407790786e-45f).x
                                  9 y: ADD_INT T0.y, T1.w, PV8.z

                                  And for target == 15 it became:

                                  2 x: MULLO_UINT ____, R1.z, R2.x

                                  y: MULLO_UINT ____, R1.z, R2.x
                                  z: MULLO_UINT ____, R1.z, R2.x
                                  w: MULLO_UINT ____, R1.z, R2.x
                                  3 x: MULLO_UINT ____, PV2.y, R2.y
                                  y: MULLO_UINT ____, PV2.y, R2.y
                                  z: MULLO_UINT ____, PV2.y, R2.y
                                  w: MULLO_UINT T0.w, PV2.y, R2.y
                                  4 x: MULLO_UINT ____, R1.y, R2.x
                                  y: MULLO_UINT ____, R1.y, R2.x
                                  z: MULLO_UINT ____, R1.y, R2.x
                                  w: MULLO_UINT ____, R1.y, R2.x
                                  5 y: ADD_INT ____, T0.w, PV4.z
                                  z: ADD_INT ____, R3.y, R0.w
                                  6 x: ADD_INT T0.x, R0.x, PV5.z

                                  32-bit multiplications in each of XYWZ units and there no references to T unit anymore. I guess that's the Cayman we're looking for. Though if it'll contain 16 thread processors (as current GPUs) with 4 stream cores each (vs current 5) value of 1760 SP (speculated ofc) for 5950 looks weird.

                                    • 6870's wavefronts
                                      Gipsel

                                       

                                      Originally posted by: empty_knapsack It turns out that it's possible to compile IL code to new 6XXX ISA at least from Catalyst 10.6.


                                      Yes, that was discussed over at Beyond3D starting here.

                                        • 6870's wavefronts
                                          empty_knapsack

                                          The funniest thing that this 4D VLIW compilation available from Catalyst 10.4 (the same time ATI broke support for 2nd core of 5970) but nobody discovered it till this October. AFAIK.

                                            • 6870's wavefronts
                                              Gipsel

                                               

                                              Originally posted by: empty_knapsack The funniest thing that this 4D VLIW compilation available from Catalyst 10.4 (the same time ATI broke support for 2nd core of 5970) but nobody discovered it till this October. AFAIK.


                                              Personally I've seen the references to the Northern Islands codename(s) and that the support for the t lane is going to be dropped in the Catalyst 9.8 for the first time (may have been in there even slightly longer, was too lazy to check; there was an error message saying that issuing instruction to the t lane is scheduled for removal in Northern Islands), i.e. right at the Cypress launch. But I've not tried if the compilation actually works (I doubt it a bit as several NI specific instructions were added only later on). I saved that for the launch of the HD6800 line

                                    • 6870's wavefronts
                                      MicahVillmow
                                      The HD68XX cards do not have double precision and the hardware memory limits have not changed.
                                      • 6870's wavefronts
                                        MicahVillmow
                                        The correct name for the card should be displayed in the next SDK release. That was an internal testing name that we were using, but since the card was launched before the SDK was released, the testing name is displayed.