Archives Discussions

zoli0726 · ‎03-01-2013

Hy. I started to write an OpenCL program and it behaves strangely.

If i debug it with codexl and stop it at some breakpoints it works fine. But without debugging and breakpoints my output is just a mess, and i have no idea why this happens. Its my first program so im quite sure im doing sth wrong. I attach my kernel, if anybody have a suggestion, please share it with me.

LeeHowes · ‎03-01-2013

In this code:

block0[localIndex] = input[globalIndex];

//IP

if(localIndex < 32) {

L0[localIndex] = block0 [(localIdy * 2) + 57 - (localIdx * 8)];

}

You read into local memory, then read out of it at different addresses but don't synchronize in the middle. You need a barrier in there where you say //IP to make it work.

View solution in original post

LeeHowes · ‎03-01-2013

In this code:

block0[localIndex] = input[globalIndex];

//IP

if(localIndex < 32) {

L0[localIndex] = block0 [(localIdy * 2) + 57 - (localIdx * 8)];

}

You read into local memory, then read out of it at different addresses but don't synchronize in the middle. You need a barrier in there where you say //IP to make it work.

zoli0726 · ‎03-02-2013

Yes, thank you, ive already found out that. There was plenty cases where i had to synchronize(and other where i didnt have to), and now its working well.

I dont know how bad these synchronizations affecting performance, maybe I should write and implementation where i dont have to use them.

LeeHowes · ‎03-04-2013

It can affect performance. My inclination is to never use a workgroup size that isn't 64 when targeting AMD hardware. Doing that means you can:

a) have more workgroups live (because on recent hardware we can manage a very large number of wavefronts, but only a small number of workgroups due to the use of barrier resources)

b) the barriers will optimise away because they are not needed to synchronise within the wavefront.

It's a vector architecture, so in many ways you are better off writing code to it as if it's a vector architecture rather than thinking of it as a set of fine-grained threads that synchronize.

Archives Discussions

Strange memory