tugrul_512bit

Erroneous GPU behavior for atomic_add and atomic_sub

Discussion created by tugrul_512bit on Nov 5, 2017
Latest reply on Nov 8, 2017 by tugrul_512bit

While running some OpenCL kernels on R7-240 and RX-550, I see this behavior:

 

float w_rnd(__global unsigned int * intSeeds,int id)                
{
                uint maxint=0;
                maxint--; 

                uint rndint=w_hash(intSeeds[id]);
uint tmp0=atomic_add(&intSeeds[id],0);
atomic_sub(&intSeeds[id],tmp0);
atomic_add(&intSeeds[id],rndint);
                //intSeeds[id]=rndint;
                return ((float)rndint)/(float)maxint;
}

 

here, the commented-out line instead of 3 atomic functions works fine on a R7-240 but does not work fine on a RX550 so I tried these 3 atomics to make sure"value update" but this caused wrong values for the last 75 percent of buffer. I need the values updated because some kernels use random number generator more than once and I thought atomics could do this but they can not.

 

R7-240 works with both atomics version and simple version but Rx550 can't work with either version.

 

whash function is just a randomizer.

 

uint whash(uint seed)
{
                seed = (seed ^ 61) ^ (seed >> 16);
                seed *= 9;
                seed = seed ^ (seed >> 4);
                seed *= 0x27d4eb2d;
                seed = seed ^ (seed >> 15);
                return seed;
}

 

 

a kernel uses random number generator to pick a random value and then depending on that random value, it does unit-positioning on a map using random number generator two more times which makes 75% of units on a zigzag line instead of normally distributed on map. R7-240 uniformly distributes all units but RX-550 distributes only to 25% (upper-left quadrant) and leaves all remaining units on an inclined line. This is for atomics version.

 

Non-atomic version distributes better for RX550 but this time distributions are dependent on first usage of random number generator such that if first value is between 0 and 0.33 then it distributes right but all 0.34 and up to 0.99 (randomized) values cause map positionings wrong places and more narrow area of map instead of uniformly to whole map. So its like first usage of generator makes second and third usages generate a narrower range of values but works totally ok for R7-240. I suspect this is due to R7-240 not caching these values for read-only in kernel but RX-550 is caching and reading non-updated values, I may be wrong. Since OpenCL uses a relaxed-memory hierarchy, this is possible right?

 

What could be the hardware dependency affecting this?

 

Also If I generate random numbers 10 times more, it works for RX550 too. Is this working due to 10 times running same function forces a value update or is it simply a randomness issue? I use only thread-id values as seeds to start each random number generation independently so both R7-240 and RX-550 should do same with whash() function from same thread-id values and same device buffers.

 

Here is a kernel using this generator(commented-out for loop fixes the erroneous behavior, also I asked someone with a Vega card to try, same thing happened as RX550(and also this for-loop fix worked!)):

 

__kernel void initShips(__global float * shipX, __global float * shipY, 
                        __global uchar * shipState, __global float * shipRotation, 
                        __global uchar * shipTeam, __global unsigned int * randSeeds,
__global int * shipHp, __global int * shipShield)
{
    float mapWidth=@@mapWidth@@;
    float mapHeight=@@mapHeight@@;


    int i=get_global_id(0); // ship id


uchar team = (int)((w_rnd(randSeeds,i)*2.9999f));
/*
for(int j=0;j<10;j++)
{
team = (int)((w_rnd(randSeeds,i)*2.9999f));
}
*/
shipTeam[i]=team;
if(team==0)
{
shipX[i]=0.001f+w_rnd(randSeeds,i)*mapWidth*0.498f;
shipY[i]=mapHeight*0.05f + wa_rnd(randSeeds,i)*mapHeight*0.90f;
shipRotation[i]=w_rnd(randSeeds,i)*30.0f-15.0f;
}
else if(team==1)
{
shipX[i]=mapWidth*0.501f+ w_rnd(randSeeds,i)*mapWidth*0.498f;
shipY[i]=mapHeight*0.5f + w_rnd(randSeeds,i)*mapHeight*0.5f;
shipRotation[i]=w_rnd(randSeeds,i)*30.0f+165.0f;
}
else if(team==2)
{
shipX[i]=mapWidth*0.501f+w_rnd(randSeeds,i)*mapWidth*0.498f;
shipY[i]=w_rnd(randSeeds,i)*mapHeight*0.5f;
shipRotation[i]rnd(randSeeds,i)*30.0f+165.0f;
}


   }

Outcomes