How best to think about the cards?
sgratton Apr 9, 2008 6:47 PMHi there,
I'm working on developing a Cholesky matrix factorization routine. While thinking about how to do this I've come up with a few questions. If it's okay I'll post them as separate threads in case someone else has already thought about or is interested in particular ones.
To start with, I'm afraid I find it a bit unclear from the beta documentation simply how best to think about the cards. (If all this'll be in the 1.0 documentation please just say!) Unfortunately I don't have a graphics background. My impression from the documentation and from other pdf's (with diagrams) I've seen about the r600 is that a 3870 should be thought of as 64 processors arranged in four groups of sixteen, with each of the 64 processors being composed of 5 subprocessors. Is this correct so far? And how does this tie into "simds", "simd arrays", "pipelines", "alu units", "spus" etc?
Relatedly, how "independent" is each processor in a group? Do they all execute the same instruction on different data, or can they independently proceed through the same program? Presumably they can't each run a different program; in fact in cal do all processors (even in different groups) have to run the same program? Funnily enough I feel most confident I understand how things work at a processor/subprocessor level, with all subprocessors within a processor having their operation directed in a synchronized manner by "VLIW" instructions in a single "r600isa" prgram.
Then, is an execution domain split into blocks of 64 "elements"? Are the 64 elements processed between the 16 processors in a group, 4 each, or are all elements processed just by one processor? And does a thread mean what I've called a block? I read somewhere threads are operated on in pairs: does each group of 16 processors then process two blocks each in an interleaved manner? And so does a thread group mean the pair of threads assigned to what I've called a group of 16 processors? ( Or does a thread mean one of the 64 elements and a thread group mean the set of 64 elements? )
Finally, some of the lower end cards seem to have different numbers of processors per group as well as a different number of groups; which concepts are invariant? E.g. do they still operate on blocks of 64 elements?
Sorry for the confusion and thanks a lot for any help!
Best,
Steven.
I'm working on developing a Cholesky matrix factorization routine. While thinking about how to do this I've come up with a few questions. If it's okay I'll post them as separate threads in case someone else has already thought about or is interested in particular ones.
To start with, I'm afraid I find it a bit unclear from the beta documentation simply how best to think about the cards. (If all this'll be in the 1.0 documentation please just say!) Unfortunately I don't have a graphics background. My impression from the documentation and from other pdf's (with diagrams) I've seen about the r600 is that a 3870 should be thought of as 64 processors arranged in four groups of sixteen, with each of the 64 processors being composed of 5 subprocessors. Is this correct so far? And how does this tie into "simds", "simd arrays", "pipelines", "alu units", "spus" etc?
Relatedly, how "independent" is each processor in a group? Do they all execute the same instruction on different data, or can they independently proceed through the same program? Presumably they can't each run a different program; in fact in cal do all processors (even in different groups) have to run the same program? Funnily enough I feel most confident I understand how things work at a processor/subprocessor level, with all subprocessors within a processor having their operation directed in a synchronized manner by "VLIW" instructions in a single "r600isa" prgram.
Then, is an execution domain split into blocks of 64 "elements"? Are the 64 elements processed between the 16 processors in a group, 4 each, or are all elements processed just by one processor? And does a thread mean what I've called a block? I read somewhere threads are operated on in pairs: does each group of 16 processors then process two blocks each in an interleaved manner? And so does a thread group mean the pair of threads assigned to what I've called a group of 16 processors? ( Or does a thread mean one of the 64 elements and a thread group mean the set of 64 elements? )
Finally, some of the lower end cards seem to have different numbers of processors per group as well as a different number of groups; which concepts are invariant? E.g. do they still operate on blocks of 64 elements?
Sorry for the confusion and thanks a lot for any help!
Best,
Steven.