Archives Discussions

BarsMonster · ‎01-05-2009

Did anyone noticed huge branching penalty?

For 500-vliw long core, having 1 branch (success chance ~1/400'000'000, so everyone goes on same branch most of the time) reduces speed by around 10-20%. Is there anyone else noticed that?

That is quite unusual after CUDA where branches are almost free as long as all threads follow the same code path.

gaurav_garg · ‎01-06-2009

That's very strange. I haven't done any experiment on this. But, documentation says Branch Granularity is same as wavefront size (64 threads).

Could you post a sample code showing this behavior?

udeepta · ‎01-13-2009

Can you post a code showing example of the behavior?

See the Stream Computing User Guide, section 1.3. Do the examples help explain the behavior?

Archives Discussions

Branching penalty