2 Replies Latest reply on Apr 27, 2015 7:59 PM by thegman

    Kaveri/hUMA performance in WPF/.NET applications

    thegman

      Hello,

      I am developing an application which runs pixel shaders on very large images. When I say 'very large' I mean sometimes 200 megapixels. They are photographic scans, from large format negatives. They are huge files.

       

      When I run pixel shaders on these images, performance is very bad indeed, it's not really any better running shaders on my GTX 750 GPU than it is running them on the Core i5 CPU. I'm thinking that pixel shader performance in GPU should be better than on CPU perhaps?

       

      Anyway, a 200MP image is 800MB of data, assuming 32 bit pixels (and sometimes they are 48 bit pixels!). So moving this image from main memory into graphics memory is not an insubstantial task, even for modern, fast PCIe 16x.

       

      So, simple questions are, can I expect better performance from an AMD APU? With a hUMA architecture, there is no copying of the 800MB of data over the system bus, as GPU and CPU share the same memory.

       

      Will .NET/WPF automatically use the capabilities of hUMA, or would it treat it like a regular GPU/CPU set up?

       

      I don't own a AMD Kaveri system, but I'd certainly be interested if I can expect major performance differences.

       

      Cheers

       

      Garry

        • Re: Kaveri/hUMA performance in WPF/.NET applications
          jvsala

          Greetings,

           

          I think that according to the description of your application you really would benefit with a Kaveri system and its unified memory. Having some experience with GPU coding, even a fast PCIE connection is a major bottleneck. Really, the golden rule for the programming of discrete GPUs is knowing the amount of computation in proportion to the amount of data you need to move. If your operations are very simple you can even slow down your application by using the GPU. It only pays off to move computations to the GPU when you need a LOT of operations for each given piece of data. So, even buying a very expensive GPU you wouldn't get better performance.

           

          With Kaveri as you don't need to move data around the PCIE bus performance is dramatically faster for these cases. The problem is that as HSA it is a really recent technology I doubt there will be adequate support for .NET so that "automagically" your shaders run faster (although maybe someone more knowledgeable than I about these aspects could shed some light about this).

           

          If you are really willing to invest a bit of time with this issue you can get easily better performance with Kaveri. I suppose you are really running simply a filter on your huge images. You could implement these operations in OpenCL making sure you use zero-copy buffers and it would fly:

           

          http://stackoverflow.com/questions/23378707/data-sharing-between-cpu-and-gpu-on-modern-x86-hardware-with-opencl-or-other-gpg

           

          As you are using .NET, you could use an OpenCL wrapper to integrate it in your app so you don't need to code in C/C++:

          https://openclnet.codeplex.com/

           

          Running a filter on some image is the archetypical example of OpenCL code. You can find a bunch of examples around (even in the SDK). So, you shouldn't have much problem getting it to run. For example, this:

          GPGPU image processing basics using OpenCL.NET - CodeProject

          1 of 1 people found this helpful
            • Re: Kaveri/hUMA performance in WPF/.NET applications
              thegman

              Hi there,

              Thanks for this, I think I'll certainly look for my next machine to be Kaveri, if only to give it a try. However, really I'm looking for .NET to support the operations for me, as I've come quite far down the road of using the WPF 'Effect' shaders, and I can't justify the time to change over to something Kaveri compatible for the 0.0001% of users who might have an APU system.

               

              I do find the systems very interesting though, reminds me of SGI NUMA systems, which always seemed like a better idea at the time!

               

              Cheers

               

              Garry