cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

himanshu_gautam
Grandmaster

No Memory Channel conflicts in NBOdy Sample

Hi All,

I am running NBody Sample from AMD APP SDK.

To give a brief introduction: the sample simulates a large a number of particles. One work-item is assigned the work for calculation of a single particle.

Now as per algorithm, each workitem needs to read the complete buffer storing the position of particles. So each work-item accesses same buffer elements as soon as they start. This should result in channel conflicts(right?) as all workgroups want to access the same data elements corrosponding to same memory channel.

But when I profile the application for (-x 10240) on cypress/cayman, I get zero FechUnit stalled value. Does that mean data is getting broadcasted to all compute units, or am i  missing something?

Thanks

Himanshu

0 Likes
6 Replies
hazeman
Adept II

The NBody example is really badly implemented from optimization point of view.

There is old post about it somewhere in forum. You can get optimal implementation ( 95% of card peek perf ) in examples of CAL++ library.

0 Likes

Thanks for replying. I know it has not been improved for a very long time.

But my question is  does the data get broadcasted if all workgroups simultanously try to access it or should there be channel conflicts..

0 Likes

This doesn't look like AMD can't disclose. One possibility might be that it is there for now, but may not be there at some point later.

0 Likes

Any one got any idea on this. Maybe someone can share their experience.

0 Likes

I can confirm that broadcast works on 5xxx with LDS ( local memory ) and with TU ( Texture Unit = images ). It doesn't work with reading from global memory using UAVs ( standard memory read ).

0 Likes

Thanks hazeman, for sharing your experience. I was also trying  to write some tests and it seems the same way, as you said.

I am still working on finding the impact of channel conflicts on global memory access.

0 Likes