11 Replies Latest reply on Jun 7, 2010 9:01 PM by cjang

    Upgraded from 4850 to 5870, no image support

    kbrafford

      I upgraded my 4850 to a 5870, but CLInfo still says that I don't get image support.  Should I just reinstall the 2.1 SDK?  Should I uninstall something first?

        • Upgraded from 4850 to 5870, no image support
          kbrafford

          Oh, and my samples that used to run in 700ms now take 3700ms.  What did I do?!?

            • Upgraded from 4850 to 5870, no image support
              cjang

              Try this:

              export GPU_IMAGES_SUPPORT=1

              in the shell environment (if using a Bourne shell variant) where the applications are being run. On the older SDK 2.0, this enables image support. This environment variable setting is necessary in the shell of OpenCL applications (just leave the X server as-is). I can attest that it works.

              There are many undocumented environment variable settings: http://forums.amd.com/forum/messageview.cfm?catid=390&threadid=128237 . Naturally, these are all at your own risk. However, I have had good experiences with images on the 5870.

                • Upgraded from 4850 to 5870, no image support
                  kbrafford

                  Thanks for the tip.  I tried that and it worked, except CLInfo failed at the end (after reporting that I did have image support).

                  So I resorted to removing ALL ATI software from my system, including the video driver (just like I did with the Thinkpad, from the other discussion thread).  And after two tries, I got what appears to be good, solid drivers and Stream SDK code in the right place.

                  I get good info from CLInfo, get about 733fps from simplegl.exe, simpleimage.exe works without crashing, and my actual work code gets 2x speedup moving from the 4850 to the 5870.

                    • Upgraded from 4850 to 5870, no image support
                      cjang

                      Thank you for the performance comparison for images between the 4850 and 5870. That information is valuable to me. This morning, someone asked me about DGEMM on the 4870. I know that memory buffers are kind of broken on pre-Evergreen hardware as local memory is implemented in global memory. So if images work, that's the way to support older architectures like R700 (especially as the texture units have L1 cache and the memory buffers do not).

                        • Upgraded from 4850 to 5870, no image support
                          kbrafford

                           

                          Thank you for the performance comparison for images between the 4850 and 5870. That information is valuable to me.


                          Whoa!!!  Don't take my numbers and assume that they reflect what you will see with your application.  I have a specific algorithm that I made with my old card (the 4850) in mind, and the fact that it only does 2x improvement on the 5870 is (I think) because I made a point to use no local memory.  And I didn't use images in the 4850, because they weren't allowed.  The reason that I was asking about images is because now that I have a decent card I want to be able to use them!

                          I think most codes will see a much greater than 2x speedup with the 5870.  I mean, they should--there are >2x the number of stream processors, AND there is real local memory, as well as the texture caches.

                            • Upgraded from 4850 to 5870, no image support
                              cjang

                              "Whoa!!!" Thanks for the caution. I re-read your post after I replied and realized that I probably inferred a bit too much.

                              If OpenCL images work on R700 at all, that's encouraging for the problems I am working with as they are essentially bottlenecked by PCIe bus data transfer (arithmetic intensity not high enough). I need to use either local memory or images. As local memory doesn't really work on R700, that means images are the way to go if it is going to work at all.

                              Some tips from experience - local memory on the Evergreen is relatively slow even if access is fully coalesced. The lack of L1 cache hurts. Images are fast. However, if PCIe bus data transfer is also counted, then memory buffers can be faster. There's some overhead with images, probably related to writing through the cache? It's going to depend on the effective hierarchy between host and device memory and pattern of data transfers. I know this is not directly related to the original question of this topic but here's what I mean: http://golem5.org/gatlas/bench_sgemm/bench_sgemm_pcie.html .

                                • Upgraded from 4850 to 5870, no image support
                                  kbrafford

                                   

                                  Some tips from experience - local memory on the Evergreen is relatively slow even if access is fully coalesced


                                  Then that might explain why my 1KFFT on the 4850 (that beats the AMD 1KFFT sample code, on that card) still beats the AMD 1KFFT on my 5870.  Don't get me wrong, the AMD code sped up tremendously when I ran it on the new card, but my own code also sped up about 2x, keeping my implementation about twice as fast as the AMD code, looking at wall time. 

                      • Upgraded from 4850 to 5870, no image support
                        MicahVillmow
                        cjang,
                        On HD5XXX series of cards local memory has bandwidth that is 2x that of L1 cache and a much higher latency as it is an ALU instruction instead of a TEX instruction.