This content has been marked as final. Show 1 reply
Originally posted by: jajce85 Hi I was wondering if someone can explain to me how do work groups and work units get executed on ATI hardware in terms of kicking of threads(smallest instance of execution i.e. a single kernel)? Say if I had n work groups each with m work units on an array with K length (just a simple 1-1 copy in kernel, no cris-crossing between global index & array index)? How many threads will get executed say on a Radeon 4870 (which has 10 compute units), how will the work groups and work units gets split up between the compute units and subsequent sub units in GPU? Basically it would be good if someone wrote a chronology/timeline of execution explaining the threads connected to the work units and work groups? Like for example : Execute instance 0 : 4 Compute units work on work group 0-3 -> thread 0 in ComputeUnit0 works on WorkUnit0 in WorkGroup0, t1 in CU0 works on WU1 in WG0 and so on.... Ex. inst 1 : 4 Compute units work on work group 5-8 -> thread 0 in CompUnit0 works on WorkUnit0 in WorkGroup5 ... ... I hope I am not asking for too much
Please read following document for more details on this and This is a good document for beginners.