5 Replies Latest reply on Jul 24, 2015 4:28 AM by dipak

    Strange behavior on Hawaii




      I have some problem with the hawaii gpu.


      I have a molecular simulation application, which runs great on tahiti(280x), and on cpu, but i get very incorrect result on hawaii.


      Almost all the kernels use local memory, and it affects the result. If i stop using it, then the results getting better but still incorrect, however it isn't a sync problem, because i don't share data between work items, i


      just use local memory to temporarily store data, because the kernels have to do a lot of memory writes.


      I have complex structures in the code, but as i said, it runs correctly on any device i tried yet, only hawaii is the exception.


      Is there any explanation why could a code run differently on hawaii compared to other devices?


      I can share parts of the code if its needed. I use the latest 15.7 catalyst driver on linux.



        • Re: Strange behavior on Hawaii

          I can confirm, that both on windows, and on linux, reverting the driver to 14.12 seems to solve the problem on hawaii.


          But on tahiti runs well, and faster with the latest driver.


          Maybe some driver optimization causing this on hawaii after 14.12.

            • Re: Strange behavior on Hawaii

              Just curious: why are you using local memory if you don't share data between work items? Local memory usage affects kernel occupancy on the GPU: why don't you try running with private memory instead (just remove the _local keyword) ? This should make your kernel faster.

              • Re: Strange behavior on Hawaii

                From your description, it seems that its a driver related issue. Could you please share the reproducible test-case?



                  • Re: Strange behavior on Hawaii

                    I used local memory because CodeXL timeline trace showed performace improvement(on tahiti). I think that its because the structures i have to keep in private memory are large, and they consumed too many registers.(maybe its a bad idea) Anyway, as i see, local memory should not affect the results in any way, but it does since catalyst 14.12. But i still have bad results without local memory. My first idea was memory alignment problems.

                    Can I send you the kernels with an executable in private?