cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

Highlighted
Miniboss
Miniboss

memtestCL-1.00-linux64 Random blocks errors on Tahiti

Jump to solution

I am running memtestCL-1.00-linux64 on tahiti and it is giving random blocks errors. Is this because of a bug in AMDs OpenCL SDK?

0 Kudos
Reply
1 Solution

Accepted Solutions
Highlighted
Challenger
Challenger

Re: memtestCL-1.00-linux64 Random blocks errors on Tahiti

Jump to solution

It is not our problem if AMD doesnt care if their products do not function. We tried to raise some voice at least...

The problem seems to be a bug in memtestCL.exe in a kernel that writes blocks of random values to memory.  The kernel has a main loop where each of 256 work items in a workgroup generates a random value and stores it in local memory block. Then each work item reads the same value from local memory and writes it to global memory. In the loop, all work items read the same random value from local memory location #255 to use as the seed for the next iteration.

The problem is a not so obvious missing local barrier needed after the line

      seed = randomBlock[blockDim -1];     // blockDim=256

At first it looks like the barrier after writing randomBlock[threadIdx] should work because there are no other local memory writes in the loop.

The problem occurs when all threads read the same value at randomBlock[255]. If thread #255 (its wave) reads the value first while others are still waiting, then it has a free execution path all the way to the next write of randomBlock[255], thus overwriting the value before slower threads/waves can read it.

In GCN, each wave executes on a single SIMD,  I'm guessing this makes for more flexible execution paths so the bug is most obvious on Tahiti because

................. yes ....... GCN is so powerful!

_

_kernel void deviceWriteRandomBlocks(__global uint* base,uint N,int seed,__local uint* randomBlock) {

    if (seed == 0) seed = 123459876+blockIdx;

    uint bitSeed = deviceRan0p(seed + threadIdx,threadIdx);

    for (uint i=0; i < N; i++) {

        // Generate block of random numbers in parallel       

        randomBlock[threadIdx]= deviceRan0p(seed,threadIdx) |

                (deviceIrbit2(&bitSeed) << 31);

        barrier(CLK_LOCAL_MEM_FENCE);

        // Set the seed for the next round to the last number

        // calculated in this round

        seed = randomBlock[blockDim-1];

//=============================================

        barrier(CLK_LOCAL_MEM_FENCE);                  //! ADD EXTRA LOCAL BARRIER HERE

//=============================================

        // Blit shmem block out to global memory

        *(THREAD_ADDRESS(base,N,i)) = randomBlock[threadIdx];

    }

After making this change the program runs fine on Tahiti. I use mingw in Windows which is not a supported environment for building memtestCL but if anyone wants the binary, just let me know.

memtestCL is copywrite through Stanford U. where a lot of the early GPU development was done. It might be interesting if they have any thoughts or comments on GCN. There is some feedback for memtestCL through SimTK.org, where is comes from.

drallan

View solution in original post

37 Replies
Highlighted
Adept I
Adept I

Re: memtestCL-1.00-linux64 Random blocks errors on Tahiti

Jump to solution

Same test fails for me on Win7 x64 for both of my 6850s (for CPU it does not detect any errors).

0 Kudos
Reply
Highlighted
Exemplar
Exemplar

Re: memtestCL-1.00-linux64 Random blocks errors on Tahiti

Jump to solution

it is same for my 5850. sometime it report around 950 errors in random block test.

0 Kudos
Reply
Highlighted
Miniboss
Miniboss

Re: memtestCL-1.00-linux64 Random blocks errors on Tahiti

Jump to solution

The number I am getting in tahiti is crazy high. Over a million errors in each iteration.

I get no errors under 5870, 6320(e-450 apu gpu tested under windows), and nvidia gtx 580/680.

0 Kudos
Reply
Highlighted
Adept II
Adept II

Re: memtestCL-1.00-linux64 Random blocks errors on Tahiti

Jump to solution

I get no errors under 5870, 6320(e-450 apu gpu tested under windows), and nvidia gtx 580/680.

No errors or very few? My HD5450 gave 400-500 errors in 3 out of 50 iterations (and none in the other 47). Given that it's always in this test alone, and everyone seems to be getting them in the very same test (and never errors on any other tests), it might also be due to the test. How extensively have you tested on nvidia gtx 580/680?

0 Kudos
Reply
Highlighted
Miniboss
Miniboss

Re: memtestCL-1.00-linux64 Random blocks errors on Tahiti

Jump to solution

I ran the test several times, with default memory setting (256) and also tried 1024 command line option. In all cases got 0 errors on Cypress XT(5870)

I also ran the tests at least once or twice until the end (50 iterations) on 580 with 1024 command line option and I can tell that it is allocating memory with nvidia-smi, it shows 1171MB allocated.  Although I should re-test 680 until the end, I never waited for that to run 50 iterations

If I were you, I would use the aticontrol utility and reduce the card&memory speed and re-test and see if the problem occurs or not. There might be borderline cases where your memory actually fail in some operations. Let us know if you get the same results. I tried this with Tahiti and well it still does millions of errors and that sort of makes me think that there is a bug in AMDs OpenCL.

0 Kudos
Reply
Highlighted
Miniboss
Miniboss

Re: memtestCL-1.00-linux64 Random blocks errors on Tahiti

Jump to solution

Correction, I re-ran the tests with different sizes and it appears I sometimes get errors from 5870.
Default 128 gave no errors(but I did not run it so many times), 256 gave errors once in a while after few runs, 512 gave errors and also 1024 gave errors in few iterations...

I reduced memory speed from 1200 to 600 and then ran 1024 again. I still got errors...

On nvidia gtx 580, I ran tests few times up to 1024 and got no errors reported.

I will return back with results from 680, nvidia seems to be having their own problems, the 680 is stuck and machine needs reboot

Also the tests run roughly 8 to 10 times on gtx 580 compared to cypress, tahiti is only 3-4 times slower.

0 Kudos
Reply
Highlighted
Miniboss
Miniboss

Re: memtestCL-1.00-linux64 Random blocks errors on Tahiti

Jump to solution

I made a test with 10000 iterations using 2.5gb memory on Nvidia Tesla M2050. I think AMD should fix this problem...

Test summary:

-----------------------------------------

10000 iterations over 2504 MiB of memory on device Tesla M2050

      Moving inversions (ones and zeros): 0 failed iterations

                                         (0 total incorrect bits)

                 Memtest86 walking 8-bit: 0 failed iterations

                                         (0 total incorrect bits)

              True walking zeros (8-bit): 0 failed iterations

                                         (0 total incorrect bits)

               True walking ones (8-bit): 0 failed iterations

                                         (0 total incorrect bits)

              Moving inversions (random): 0 failed iterations

                                         (0 total incorrect bits)

             True walking zeros (32-bit): 0 failed iterations

                                         (0 total incorrect bits)

              True walking ones (32-bit): 0 failed iterations

                                         (0 total incorrect bits)

                           Random blocks: 0 failed iterations

                                         (0 total incorrect bits)

                     Memtest86 Modulo-20: 0 failed iterations

                                         (0 total incorrect bits)

                           Integer logic: 0 failed iterations

                                         (0 total incorrect bits)

                 Integer logic (4 loops): 0 failed iterations

                                         (0 total incorrect bits)

            Integer logic (local memory): 0 failed iterations

                                         (0 total incorrect bits)

   Integer logic (4 loops, local memory): 0 failed iterations

                                         (0 total incorrect bits)

Final error count: 0 errors

0 Kudos
Reply
Highlighted
Adept II
Adept II

Re: memtestCL-1.00-linux64 Random blocks errors on Tahiti

Jump to solution

So step 1 is fetching the source code.

https://simtk.org/project/xml/downloads.xml?group_id=385#package_id906 offers it but you need a simtk account. Does anyone have a simtk account and can they post it here in a way that needs no such external accounts?

0 Kudos
Reply
Highlighted
Miniboss
Miniboss

Re: memtestCL-1.00-linux64 Random blocks errors on Tahiti

Jump to solution

You know it is free to get an account right? Here it is...

0 Kudos
Reply