I've been programing NVidia hardware up until now and I'm trying to understand extactly how the RV870 differs in repect to some aspects of thread/work group execution. If anyone could answer these question, it would be much appreciated!
*) According to the OpenCL device info, the maximum work-group size is 256 threads, so this means that each stream processor cluster/compute unit is capable of executing 4 wavefronts concurrently?
*) Can these work-groups be allocated to different work-groups, or is there a limitation on the number of work-groups that an spc can execute at a given time? (eg only 1 work-group at once, irrespective of thread count)
*) According to the Evergreen docs, each spc has a register file of only 128 128bit registers. This is much smaller than Nvidia's register file, so how are these distributed amongst threads? Is it in any way like the NVidia case where a kernel's register use affects the occupancy/ max number of threads per work-group or groups/compute unit?