cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

yurtesen
Miniboss

memtestCL-1.00-linux64 Random blocks errors on Tahiti

I am running memtestCL-1.00-linux64 on tahiti and it is giving random blocks errors. Is this because of a bug in AMDs OpenCL SDK?

0 Likes
1 Solution

It is not our problem if AMD doesnt care if their products do not function. We tried to raise some voice at least...

The problem seems to be a bug in memtestCL.exe in a kernel that writes blocks of random values to memory.  The kernel has a main loop where each of 256 work items in a workgroup generates a random value and stores it in local memory block. Then each work item reads the same value from local memory and writes it to global memory. In the loop, all work items read the same random value from local memory location #255 to use as the seed for the next iteration.

The problem is a not so obvious missing local barrier needed after the line

      seed = randomBlock[blockDim -1];     // blockDim=256

At first it looks like the barrier after writing randomBlock[threadIdx] should work because there are no other local memory writes in the loop.

The problem occurs when all threads read the same value at randomBlock[255]. If thread #255 (its wave) reads the value first while others are still waiting, then it has a free execution path all the way to the next write of randomBlock[255], thus overwriting the value before slower threads/waves can read it.

In GCN, each wave executes on a single SIMD,  I'm guessing this makes for more flexible execution paths so the bug is most obvious on Tahiti because

................. yes ....... GCN is so powerful!

_

_kernel void deviceWriteRandomBlocks(__global uint* base,uint N,int seed,__local uint* randomBlock) {

    if (seed == 0) seed = 123459876+blockIdx;

    uint bitSeed = deviceRan0p(seed + threadIdx,threadIdx);

    for (uint i=0; i < N; i++) {

        // Generate block of random numbers in parallel       

        randomBlock[threadIdx]= deviceRan0p(seed,threadIdx) |

                (deviceIrbit2(&bitSeed) << 31);

        barrier(CLK_LOCAL_MEM_FENCE);

        // Set the seed for the next round to the last number

        // calculated in this round

        seed = randomBlock[blockDim-1];

//=============================================

        barrier(CLK_LOCAL_MEM_FENCE);                  //! ADD EXTRA LOCAL BARRIER HERE

//=============================================

        // Blit shmem block out to global memory

        *(THREAD_ADDRESS(base,N,i)) = randomBlock[threadIdx];

    }

After making this change the program runs fine on Tahiti. I use mingw in Windows which is not a supported environment for building memtestCL but if anyone wants the binary, just let me know.

memtestCL is copywrite through Stanford U. where a lot of the early GPU development was done. It might be interesting if they have any thoughts or comments on GCN. There is some feedback for memtestCL through SimTK.org, where is comes from.

drallan

View solution in original post

37 Replies