29 Replies Latest reply on Oct 7, 2013 7:10 AM by himanshu.gautam

    Different results with HD 7970 and HD 7750

    wayne_static

      Hello,

       

      I have a kernel that I have written to perform some dynamic programming routine particularly targeting the GCN architecture. Recently, I tried to optimize the kernel by getting rid of If-Else constructs and replacing them with select instead. However, the same kernel works fine for my HD 7970 GPUs and with some improvement in speed but the strange thing is that the same kernel does not work correctly on the HD 7750 GPUs.

       

      By not working I mean - the output of the kernel is a a huge table of values. I verify against a sequential implementation on CPU after a kernel execution and the HD 7970 results are always correct but the results from the HD 7750 are somewhere between 60% to 90% correct. For example, 4,193,984 out of 4,194,304 passes verification.

       

      Again ONLY thing I did was replace if-else with select in the kernel. Could anyone please shed some light on this strange behavior? Many thanks and I can provide kernel codes if necessary. Thanks.

        • Re: Different results with HD 7970 and HD 7750
          nou

          it may be bug in driver or faulty hardware. best thing is if you can provide test case.

            • Re: Different results with HD 7970 and HD 7750
              wayne_static

              Hi nou thanks for the reply. I am not ruling out your response but may I also mention that this behavior also exist on the nVidia hardware as well, GeForce 650 and 680 GTX to be precise. I don't know what this means with respect to drivers. Please could you elaborate on what you mean by test case in this situation? Thanks

                • Re: Different results with HD 7970 and HD 7750
                  himanshu.gautam

                  As the code is failing on nvidia as well as 7750, i would guess it is accidentally passing on 7970. 7750 & 7970 are both GCN, it is hard to imagine them giving different results. I would guess you have different drivers installed on 7750 & 7970 machines. Are they running same OS, and do they latest APP SDK? Latest catalyst driver is recommended (13.8 beta as of today) Before sharing your kernel, i would suggest you to check verification logic and all the places where you used select instead of if-else. You might be having some silly bug somewhere ;) If nothing rings a bell, feel free to share your kernels here. It is recommended to attach a testcase that can be downloaded by anyone and compiled with little hassles. Use advanced editor for attaching.

                    • Re: Different results with HD 7970 and HD 7750
                      wayne_static

                      Thanks for the reply. I agree with you and it is hard to imagine such behavior. At the moment, all machines are running identical drivers, i.e, Catalyst version 13.4 and AMD APP version 1124.2 which comes with the latest SDK version 2.8.1. All machines are also running same copies of Windows 7 Enterprise 64-bit. Maybe I should also mention that the machine with the GeForce 680 GTX also has same version of OS and does not use the AMD APP SDK.

                       

                      I usually work with a single project using Visual Studio 2012 and then copy the project to which ever machine I want to run tests on. All results are integers so there are no floating-point headaches. Input data is randomized and output data is a table and so the verification process is simply a matter of looping through the GPU values and comparing with the sequential CPU results. All of these happen in one execution of the code.

                       

                      Do you suggest I update to the catalyst driver 13.8 beta and try again before providing a test case? Thanks.

                        • Re: Different results with HD 7970 and HD 7750
                          nou

                          yes try latest drivers at there is chance that it was already fixed.

                            • Re: Different results with HD 7970 and HD 7750
                              wayne_static

                              I have updated the machine with the HD 7750 GPU to catalyst version 13.8 beta2 but it still fails verification. This machine is also equipped with an A10-5800K APU and it also fails on the HD 7660D GPU attached to it.

                                • Re: Different results with HD 7970 and HD 7750
                                  himanshu.gautam

                                  Please provide us the testcase

                                    • Re: Different results with HD 7970 and HD 7750
                                      himanshu.gautam

                                      Hi Not able to access the above link due to internal security reasons. Please give us the direct link or attach the file/project directly in this. Dont post any 3rd party urls.

                                        • Re: Re: Different results with HD 7970 and HD 7750
                                          wayne_static

                                          Apologies I attached a wrong project. Please find attached the original test case I attempted to link to. Many thanks.

                                            • Re: Different results with HD 7970 and HD 7750
                                              himanshu.gautam

                                              Hi Wayne,

                                               

                                              A cursory glance at your code revealed some race conditions in your kernel.

                                              A very similar scenario was reported in NVIDIA forums some 5 years back - where everyone thought it was a hardware bug.

                                              But it turned out to be a race condition.

                                               

                                              here is what I found (there could be others hiding -- request you to prune your code)

                                               

                                              1. dps1_kernel - A "barrier" in the middle of FOR loop will cause race conditions between UPPER and LOWER half.

                                                                        This is a very subtle race that can dodge even the trained eyes.

                                                                         You need to have another barrier towards end of FOR loop

                                               

                                              2. dps1_kernel -- A "barrier" cannot be used in the middle of FOR loop that reads for(x=tid; x<constantN; x += localSize)

                                                                          Technically, some threads cannot enter the Loop and "barrier" will never be reached...

                                                                           Unless -- you know for sure that "localSize" divides "constantN" perfectly.

                                                                           In such cases, you need to write something like this:

                                                                           for(x =0; x<N; x+=localSize)

                                                                           {

                                                                                     if ((x + localId) < N)

                                                                                     { DO WORK }

                                                                                     barrier();

                                                                                     if ((x +localID) < N)

                                                                                     { DO SOME MORE WORK }

                                                                                     barrier(); // This is important!

                                                                           }

                                               

                                              i have not checked other kernels. I hope you will be able to refactor your code with this input.

                                              If the bug remains, please post here.

                                              - Bruhaspati

                            • This reply has been hidden. This can happen if the message has been hidden by a moderator, or has been reported as abusive.