Now that the NDA expired...
Have you changed the wavefront from 64 to 80 for the 6870? Everybody is guessing.
From first reviews 6850/6870 looks like exactly the same as 58XX. I'm even curious is target id changed at all for 6XXX family?.. And even if it changed, will binary code differs simply in one byte as it's the case with Cypress/Juniper (of course without using DPFP)?
rumors on the internet says cayman will be much more different. barts are just tuned cypress architeture.
isn't 6870 just 14 compute units with 80 vliw ALU:s?
if they have changed the number of ALU:s per stream core to 4 (from 5), then the wavefront ought to be 80. but probably they still have 5 ALU:s per stream core, i.e. 16 stream cores per CU and wavefront size 64.
The size of wavefront only depends on the number of stream cores present in a Compute Unit.It doesn't matter how many processing elements you have inside a stream core.So wavefront size will remain 64 unless we have changes in the number of stream core in a compute unit.
exactly. but I didn't know how many stream cores that are in the new architecture. so dividing what i did know, the number of ALU:s, with number per stream cores indicates the wavefront size. as i was trying to explain, the only way to get 80 in wavefrontsize is if the stream core only have 4 ALU:s.
Is anything changed at all from programming point of view? New instructions? Better char/short support? Any papers available?
I'm assuming then that there are still 16 TPs/SIMD organized into quads, as before?
Is AMD abandoning double precision in its furture cards, or is this 6XXX card line a special case ?
DP support have higend cards. IMHO Cayman alias 69xx cards should have DP support.
Why would you think we were abandoning double precision, bayoumi? The enthusiast level 6xxx GPU has not been released yet. The enthusiast level card was the only GPU in the 5xxx line that had double precision.
Don't compare 58xx to 68xx because the entire lineup has had changed that meant the specific numbering of levels of GPU has been tweaked - the order and price points are what matter for comparison.
Probably the numbering confused me. ATI had 3870/4870/5870 as the default for high end GPGPU with DP. It is natural that some people (like myself) were planning on buying the 6870 by default as the DP card
The numbering is a little confusing if people try to map from one generation to the next, but that was unavoidable. Once the full lineup is on the market it will make more sense.
For double precision, wait a few weeks for the 69xx
when we talk about confusing. Buzzard, wait what?
Micah, the new HD58XX's?
Well... About the new HD68XX's, what improvements for compute?
As far as opencl is concerned, do we have the same memory limits as 5xxx? E.g LDS, __constant, etc.
What about double precision?
Originally posted by: MicahVillmow Sorry, typo on my part, It should be HD68XX. The HD68XX has 12/14 SIMD's and has improvements which lower the cost of thread scheduling. This means that flow control clauses don't don't require as many cycles.
great. can we expect more GPGPU optimization on Cayman than Barts?
It turns out that it's possible to compile IL code to new 6XXX ISA at least from Catalyst 10.6. New targets were added to calclCompile() functions from 12 to 19. While 12-14 and 17-19 producing code exactly the same as for Cypress/Juniper (only header differs in 1-4 bytes) and probably one of these matching the Bart's ISA, 15 and 16 is totally different story. For example, some code compiled for 5XXX starts as:
2 z: ADD_INT ____, R2.y, R0.w t: MULLO_UINT T0.y, R1.z, R3.x 3 z: MOV R0.z, KC0.z w: ADD_INT T1.w, R0.x, PV2.z t: MOV R0.w, KC0.w 4 t: MULLO_UINT T0.w, T0.y, R3.y 5 t: MULLO_UINT ____, R1.y, R3.x 6 y: ADD_INT ____, T0.w, PS5 7 w: ADD_INT ____, R1.x, PV6.y 8 z: LSHL ____, PV7.w, (0x00000006, 8.407790786e-45f).x 9 y: ADD_INT T0.y, T1.w, PV8.zAnd for target == 15 it became:
2 x: MULLO_UINT ____, R1.z, R2.x
y: MULLO_UINT ____, R1.z, R2.x z: MULLO_UINT ____, R1.z, R2.x w: MULLO_UINT ____, R1.z, R2.x 3 x: MULLO_UINT ____, PV2.y, R2.y y: MULLO_UINT ____, PV2.y, R2.y z: MULLO_UINT ____, PV2.y, R2.y w: MULLO_UINT T0.w, PV2.y, R2.y 4 x: MULLO_UINT ____, R1.y, R2.x y: MULLO_UINT ____, R1.y, R2.x z: MULLO_UINT ____, R1.y, R2.x w: MULLO_UINT ____, R1.y, R2.x 5 y: ADD_INT ____, T0.w, PV4.z z: ADD_INT ____, R3.y, R0.w 6 x: ADD_INT T0.x, R0.x, PV5.z32-bit multiplications in each of XYWZ units and there no references to T unit anymore. I guess that's the Cayman we're looking for. Though if it'll contain 16 thread processors (as current GPUs) with 4 stream cores each (vs current 5) value of 1760 SP (speculated ofc) for 5950 looks weird.
Originally posted by: empty_knapsack It turns out that it's possible to compile IL code to new 6XXX ISA at least from Catalyst 10.6.
Yes, that was discussed over at Beyond3D starting here.
The funniest thing that this 4D VLIW compilation available from Catalyst 10.4 (the same time ATI broke support for 2nd core of 5970) but nobody discovered it till this October. AFAIK.
Originally posted by: empty_knapsack The funniest thing that this 4D VLIW compilation available from Catalyst 10.4 (the same time ATI broke support for 2nd core of 5970) but nobody discovered it till this October. AFAIK.
Personally I've seen the references to the Northern Islands codename(s) and that the support for the t lane is going to be dropped in the Catalyst 9.8 for the first time (may have been in there even slightly longer, was too lazy to check; there was an error message saying that issuing instruction to the t lane is scheduled for removal in Northern Islands), i.e. right at the Cypress launch. But I've not tried if the compilation actually works (I doubt it a bit as several NI specific instructions were added only later on). I saved that for the launch of the HD6800 line
Retrieving data ...