Archives Discussions

bubu · ‎10-22-2010

Now that the NDA expired...

Have you changed the wavefront from 64 to 80 for the 6870? Everybody is guessing.

empty_knapsack · ‎10-22-2010

From first reviews 6850/6870 looks like exactly the same as 58XX. I'm even curious is target id changed at all for 6XXX family?.. And even if it changed, will binary code differs simply in one byte as it's the case with Cypress/Juniper (of course without using DPFP)?

nou · ‎10-22-2010

rumors on the internet says cayman will be much more different. barts are just tuned cypress architeture.

eklund_n · ‎10-22-2010

isn't 6870 just 14 compute units with 80 vliw ALU:s?

if they have changed the number of ALU:s per stream core to 4 (from 5), then the wavefront ought to be 80. but probably they still have 5 ALU:s per stream core, i.e. 16 stream cores per CU and wavefront size 64.

himanshu_gautam · ‎10-22-2010

eklund.n,

The size of wavefront only depends on the number of stream cores present in a Compute Unit.It doesn't matter how many processing elements you have inside a stream core.So wavefront size will remain 64 unless we have changes in the number of stream core in a compute unit.

eklund_n · ‎10-22-2010

exactly. but I didn't know how many stream cores that are in the new architecture. so dividing what i did know, the number of ALU:s, with number per stream cores indicates the wavefront size. as i was trying to explain, the only way to get 80 in wavefrontsize is if the stream core only have 4 ALU:s.

MicahVillmow · ‎10-22-2010

The wavefront size on the new graphics card did not change. It is still 64.

empty_knapsack · ‎10-22-2010

Is anything changed at all from programming point of view? New instructions? Better char/short support? Any papers available?

ryta1203 · ‎10-22-2010

I'm assuming then that there are still 16 TPs/SIMD organized into quads, as before?

bayoumi · ‎10-27-2010

Is AMD abandoning double precision in its furture cards, or is this 6XXX card line a special case ?

nou · ‎10-27-2010

DP support have higend cards. IMHO Cayman alias 69xx cards should have DP support.

LeeHowes · ‎10-27-2010

Why would you think we were abandoning double precision, bayoumi? The enthusiast level 6xxx GPU has not been released yet. The enthusiast level card was the only GPU in the 5xxx line that had double precision.

Don't compare 58xx to 68xx because the entire lineup has had changed that meant the specific numbering of levels of GPU has been tweaked - the order and price points are what matter for comparison.

bayoumi · ‎10-28-2010

Lee,

Probably the numbering confused me. ATI had 3870/4870/5870 as the default for high end GPGPU with DP. It is natural that some people (like myself) were planning on buying the 6870 by default as the DP card

LeeHowes · ‎10-28-2010

The numbering is a little confusing if people try to map from one generation to the next, but that was unavoidable. Once the full lineup is on the market it will make more sense.

For double precision, wait a few weeks for the 69xx

nou · ‎10-28-2010

when we talk about confusing. Buzzard, wait what?

MicahVillmow · ‎10-22-2010

Yes, it is still 16 TPs/SIMD. The new HD58XX's should provide more power to your applications without requiring any changes to your program.

eduardoschardong · ‎10-22-2010

Micah, the new HD58XX's?

Well... About the new HD68XX's, what improvements for compute?

MicahVillmow · ‎10-22-2010

Sorry, typo on my part, It should be HD68XX. The HD68XX has 12/14 SIMD's and has improvements which lower the cost of thread scheduling. This means that flow control clauses don't don't require as many cycles.

gat3way · ‎10-22-2010

As far as opencl is concerned, do we have the same memory limits as 5xxx? E.g LDS, __constant, etc.

What about double precision?

nou · ‎10-23-2010

Originally posted by: MicahVillmow Sorry, typo on my part, It should be HD68XX. The HD68XX has 12/14 SIMD's and has improvements which lower the cost of thread scheduling. This means that flow control clauses don't don't require as many cycles.

great. can we expect more GPGPU optimization on Cayman than Barts?

empty_knapsack · ‎10-23-2010

It turns out that it's possible to compile IL code to new 6XXX ISA at least from Catalyst 10.6. New targets were added to calclCompile() functions from 12 to 19. While 12-14 and 17-19 producing code exactly the same as for Cypress/Juniper (only header differs in 1-4 bytes) and probably one of these matching the Bart's ISA, 15 and 16 is totally different story. For example, some code compiled for 5XXX starts as:

2 z: ADD_INT ____, R2.y, R0.w
t: MULLO_UINT T0.y, R1.z, R3.x
3 z: MOV R0.z, KC0[0].z
w: ADD_INT T1.w, R0.x, PV2.z
t: MOV R0.w, KC0[0].w
4 t: MULLO_UINT T0.w, T0.y, R3.y
5 t: MULLO_UINT ____, R1.y, R3.x
6 y: ADD_INT ____, T0.w, PS5
7 w: ADD_INT ____, R1.x, PV6.y
8 z: LSHL ____, PV7.w, (0x00000006, 8.407790786e-45f).x
9 y: ADD_INT T0.y, T1.w, PV8.z

And for target == 15 it became:

2 x: MULLO_UINT ____, R1.z, R2.x

y: MULLO_UINT ____, R1.z, R2.x
z: MULLO_UINT ____, R1.z, R2.x
w: MULLO_UINT ____, R1.z, R2.x
3 x: MULLO_UINT ____, PV2.y, R2.y
y: MULLO_UINT ____, PV2.y, R2.y
z: MULLO_UINT ____, PV2.y, R2.y
w: MULLO_UINT T0.w, PV2.y, R2.y
4 x: MULLO_UINT ____, R1.y, R2.x
y: MULLO_UINT ____, R1.y, R2.x
z: MULLO_UINT ____, R1.y, R2.x
w: MULLO_UINT ____, R1.y, R2.x
5 y: ADD_INT ____, T0.w, PV4.z
z: ADD_INT ____, R3.y, R0.w
6 x: ADD_INT T0.x, R0.x, PV5.z

32-bit multiplications in each of XYWZ units and there no references to T unit anymore. I guess that's the Cayman we're looking for. Though if it'll contain 16 thread processors (as current GPUs) with 4 stream cores each (vs current 5) value of 1760 SP (speculated ofc) for 5950 looks weird.

Gipsel · ‎10-25-2010

Originally posted by: empty_knapsack It turns out that it's possible to compile IL code to new 6XXX ISA at least from Catalyst 10.6.

Yes, that was discussed over at Beyond3D starting here.

empty_knapsack · ‎10-25-2010

The funniest thing that this 4D VLIW compilation available from Catalyst 10.4 (the same time ATI broke support for 2nd core of 5970) but nobody discovered it till this October. AFAIK.

Gipsel · ‎10-25-2010

Originally posted by: empty_knapsack The funniest thing that this 4D VLIW compilation available from Catalyst 10.4 (the same time ATI broke support for 2nd core of 5970) but nobody discovered it till this October. AFAIK.

Personally I've seen the references to the Northern Islands codename(s) and that the support for the t lane is going to be dropped in the Catalyst 9.8 for the first time (may have been in there even slightly longer, was too lazy to check; there was an error message saying that issuing instruction to the t lane is scheduled for removal in Northern Islands), i.e. right at the Cypress launch. But I've not tried if the compilation actually works (I doubt it a bit as several NI specific instructions were added only later on). I saved that for the launch of the HD6800 line

MicahVillmow · ‎10-22-2010

The HD68XX cards do not have double precision and the hardware memory limits have not changed.

MicahVillmow · ‎10-28-2010

The correct name for the card should be displayed in the next SDK release. That was an internal testing name that we were using, but since the card was launched before the SDK was released, the testing name is displayed.

Archives Discussions

6870's wavefronts