Up until now I still don't get it.
Does instance() work like usual without conflicting with instanceInGroup()?
If we have streams bigger than GROUP_SIZE, will it have different lds? Or can't access other lds?
Why Stream of type VectorType/float4 can read from array of ScalarType/float?
What is the range of instanceInGroup().x there? Is it the same of GROUP_SIZE?
How do I know my thread is in different group from other threads?
What it means:
//Reading from last Thread
item = 4 * (GROUP_SIZE - 1 - instanceInGroup().x )+ 0;
How come it reads from last thread?
If instanceInGroup().x returns 0 to 63 then 4*(64-1- anywhere from 0 to 63) + 0 = reverse.... really?
The shared VectorType lds; is used only in offset 0 right? If so, the algorithm can work with some modification and run with only 64 lds elements right?
So the algorithm means reverse summing column of a and b then put in c... If my thoughts are correct...
Can I build 2d/3d LDS? How big its maximum?
In lds sample, one group is one row right?
Can I specify 2d group in 3d stream?
Sorry for redundant questions.