I am running NBody Sample from AMD APP SDK.
To give a brief introduction: the sample simulates a large a number of particles. One work-item is assigned the work for calculation of a single particle.
Now as per algorithm, each workitem needs to read the complete buffer storing the position of particles. So each work-item accesses same buffer elements as soon as they start. This should result in channel conflicts(right?) as all workgroups want to access the same data elements corrosponding to same memory channel.
But when I profile the application for (-x 10240) on cypress/cayman, I get zero FechUnit stalled value. Does that mean data is getting broadcasted to all compute units, or am i missing something?