cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

smatovic
Adept II

SPMD on 5770

Does the Processing Elements have their own instruction counter?

Heyho,

 

i developed on an NV 8800GT and had to realize that every if-else doubled the runtime.

 

What about the ATI 5770, does it provide "real" SPMD with own instruction counters, so one process executes the if and the other executes the else only?

 

The OpenCL documentation mentions only that if the local work group size is set to 1 than the gpu should work in SPMD "mode".

 

Regards,

Srdja

0 Likes
11 Replies
nou
Exemplar

no. on ATI gpu it execute both branch of if/else if there is not coherency across whole workgroup.

0 Likes

do you mean if i put all threads into one work-group than they have their own instruction counters and do not execute every condition?

0 Likes

Like the G80, Fermi, Larrabee or an SSE CPU, it is a SIMD machine. That's how you get a high density of compute units and hence high arithmetic throughput in hardware. It executes one instruction across the entire wavefront at a time. Even if (as on Fermi, I think) they have their own instruction counters, this is still the case. I can't imagine that changing at any point in the near future given that the trend (look at AVX and Larrabee native instructions) is for wider SIMD, not narrower.

It is SPMD, but it's still SIMD within a wave. The waves can go out of sync, as each has its own program counter, and the hardware will interleave waves from multiple workgroups (or multiple kernels in DX and hopefully OpenCL soon) and by running a different group/kernel on a different core.

0 Likes

In this thread there is small discussion about gpu threading. Maybe it will help.

 

 

0 Likes

Thanks for the link.

All work-items inside a workgroup are working SIMD like and these SIMD Units work autonomous lilke SPMD. Right?

0 Likes

Not quite. Only a single wavefront is really SIMD (and even then it's only logically SIMD). A work group is made of multiple wavefronts, each of which has its own program counter and can drift apart in SPMD fashion - the only thing is that within a workgroup you can synchronise on barriers and exchange data in LDS between wavefronts. Once you hit a barrier you have pulled your wavefronts close together again and hence it is a bit like being SIMD across an entire workgroup. It need not be the case if you never barrier, though.

Different work groups from the same kernel are then SPMD whether on the same SIMD core (where it's only logical of course, they're not executing on the same cycle) or multiple cores. Different work groups from different kernels are not even necessarily SPMD across the device.

0 Likes

Does each SIMD have a program stack?

i.e. to push/pop a return address when calling a subroutine/function.

Larrabee, having Pentium P54C processors obviously does.

But I had the impression that ATI 5xxx and Fermi do not and that all function calls are inlined.

 

 

0 Likes

Functions are inlined, that is correct.

0 Likes

>> Not quite.

 

So one Wavefron executes 16 Threads in one cycle....and these 16 Threads are SIMD?

0 Likes

One wavefront executes 16 work items every second cycle. Each individual set of 16 items is directly SIMD. The entire wavefront is logically SIMD in that the same instruction will be issued 4 times to cover the entire wavefront. Hence branch divergence is at the wavefront level, not at the physical SIMD level.

0 Likes

All functions are inlined most of the time, however, AMD hardware can support function calls but doing so is a fairly large performance hit and thus not preferable to inlining everything.
0 Likes