Hi rahulgarg,
a) Just clarifying terminology (otherwise we'll all start talking in slightly different terminology and get confused... :-)). The FireStream 9170, Radeon HD 3870 and FireGL V7700 are all stream processors using the RV670 GPU. The RV670 has 4 SIMD arrays. Each of those SIMD arrays has 16 thread processors. Each of those thread processors consumes 5-wide VLIW instructions. Each thread processor has 5 stream cores used to process the 5 instructions in the VLIW instruction. Those cores are labeled x, y, z, w and t. All can do int and SPFP. t can do transcendentals also. And DPFP is performed by fusing together x, y, z and w to perform a single DPFP op.
b) All the thread processors in a SIMD array must be running the same instruction on a particular clock cycle. Different SIMD arrays can be running different instructions. However, this level of control is not available directly to the user and is handled by the thread dispatcher inside of the GPU.
c) There isn't a sync instruction yet. This is a feature which may show up in the next few generations of GPUs.
d) I believe the caches are shared by the texture units. Unfortunately, I don't actually know the exact cache size on the current GPUs.
e) You should not count on synchronization between multiple stream cores at this time.
We are actually close to releasing a technical overview (need to proofread it with some of the engineers here and the legal department to make sure we didn't leak inappropriate information. :-)).
It'll show up on the website in a few weeks.
Michael.