Hi everybody, here we are for a new topic about OpenCL. Hope someone will answer me deeply.
Now, after the software setup on my laptop has been solved (http://devgurus.amd.com/message/1300783#1300783), I'd like to obtain clarifications about the hardware.
Actually, I have installed an ATI Mobility Radeon HD 5650 and I am consulting your guide (http://developer.amd.com/download/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf). In the first chapter a long discussion is done about the architecture, but it makes me some confusion, honestly. Probably it is me not understanding in a good way.
I'd like to know if the processing elements are considered as the ALU units or something different. Then it is told about Evegreen/Northern Island/Southern Island families (Desktop families, right?), but nothing about Mobility (or Manhattan, don't know), and honestly I don't know if they have the same/similar features from the hardware point of view.
Avoiding to write a long post, I just would like to know:
1) the unit vectors are 4 for all the families (Desktop and Mobility)?
2) the processing elements (not ALUs?) are 16 per vector unit? (page 20 of the previous guide)
3) the ALUs per processing element are 4/5 according to the different families (end of page 21), right? And for HD 5650?
4) what do you mean for "work item"? at page 22 it is written:
"For devices in the Northern Islands and Southern Islands families, these ALUs are arranged in four (in the Evergreen family, there are five) processing elements with arrays of 16 ALUs. Each of these arrays executes a single instruction across each lane for each of a block of 16 work-items".
So a work item corresponds to an ALU unit which previously corresponded to a processing element (page 20)? Which is the physical correspondance of a work item?
5) is there a way to obtain a technical guide for the specific board I have (a sort of manual)? Here (Radeon HD 5000 Series - Wikipedia, the free encyclopedia) there are no informations about processing elements, ALU, and so on. Where can I find them related to the board I have? Moreover, is there a connection between TMU/ROP and processing elements/ALU or something else?
I would like to know these informations for managing the execution process in OpenCL in a clear way, considering work items, work groups, wavefronts and so on, aiming to optimize the design and to reduce the computation time for memory/execution tasks.