12 Replies Latest reply on Oct 12, 2009 12:07 PM by alexaverbuch

    rv770 White Paper

    alexaverbuch
      Official specifications... or at least a subset of them?

      Hi and sorry to spam this post to a few different forums,

      Does anyone know where I can find the specifications for the rv770. Including memory regions, their sizes, their access latencies/bandwidth, etc?

      I can get a lot of information by crawling the internet and reading every review of the rv770, but I assume there exists an official source for the information that I want... AMD (I assume)  wants to empower developers to use their products?

      Any suggestions would be greatly appreciated!

      Regards,

      Alex Averbuch

        • rv770 White Paper
          Ceq

          I think you may find some interesting documentation about the RV770 in the Stream SDK main page:

          http://developer.amd.com/gpu/ATIStreamSDK/Pages/default.aspx

          In the download section there are several pdfs, have a look at "AMD R700-Family Instruction Set Architecture".

          • rv770 White Paper
            MicahVillmow
            Rahul, You can find more information about cache's and how they work in slides that were recently posted of how we optimized ACML-Sgemm for RV670 hardware. R770 has a very similar cache structure except instead of having a 4 way L1 cache, each SIMD gets its own L1 cache.

            http://developer.amd.com/gpu_a...on%20Illustration.ppt

            More information can be found in documents here:
            http://developer.amd.com/gpu/A...ages/Publications.aspx
              • rv770 White Paper
                alexaverbuch

                Thanks everyone,

                Regarding on-chip memory, I have been trying to figure out which memory is relevant to GPGPU computation and which is purely (mostly) beneficial to traditional graphics workloads.

                Local Shared Memory (16kb per SIMD): This seems to be "general purpose"/"scratch" memory that IS useful for GPGPU and has no coherancy

                Global Shared Memory (16kb): This seems to be "general purpose"/"scratch" memory that IS useful for GPGPU and has no coherancy

                L1 (8kb per SIMD?) (coherancy?): This is a Texture Cache and is not really suited to GPGPU

                L2 (size?) (Read-only no coherancy?): This is connected to the Memory Controller from what I can see so it is (implicitly) used when accessing RAM. Is it also used when accessing Local & Global Shared Memory?

                Texture Cache (size?) (coherancy?): does this exist? or are L1 & L2 both "Texture Cache"?

                 

                If anyone could agree/disagree/discuss my comments above it would be greatly appreciated.

                Regards,

                Alex Averbuch

                • rv770 White Paper
                  rahulgarg

                  Micah: Thanks for the link.

                    • rv770 White Paper
                      alexaverbuch

                      I don't need detailed information, just basic explanations to my questions would be great, and I'm sure the AMD employees could answer these questions easily...

                      Anybody?

                        • rv770 White Paper
                          alexaverbuch

                          From another thread...

                           

                          Originally posted by: n0thing ATI's current OpenCL implementation is only for the CPU and caching is automatic from the main memory, so everything is cached I guess except for texture buffers as the textures are not supported in CPU's implementation [ requires Fixed function logic like texture units and samplers ]

                          For the GPU implementation here are my predictions:

                          1. Texture Cache : There is a texture cache per SIMD unit, 8kb I think(on rv770). Texture caches are optimized for spatial coherence in texture fetches so you don't need to coalesce as it is automatically done by the tiled rasterization order (fetching a quad of texels) of textures. 

                          2. Local memory on rv770(LDS) is 16KB per SIMD unit, (R800 should be 32kb as it should support DX11). This memory is configured with 4 banks, each with 256 entries of 16 bytes. So you can read upto 4 aligned 32 bit words in 1 read access from the LDS. Writes have no bank conflicts as each thread can only write to its private location, hence the LDS is not as generic as shared memory specified by OpenCL specification. R800 should support OpenCL's shared memory.

                          3. Constant cache is 64KB, no idea about coalescing.

                          4. OpenCL specification says : Reads and writes to global memory may be cached depending on the capabilities of the device.

                          Here is what OpenCL specification says about constant address space :

                          The __constant or constant address space name is used to describe variables allocated in global memory and which are accessed inside a kernel(s) as read-only variables.  These read-only variables can be accessed by all (global) work-items of the kernel during its execution.  This qualifier can be used with arguments to functions (including __kernel functions) that are declared as pointers, or with local variables inside a function declared as pointers, or with global variables.  Global variables declared in the program source with the __constant qualifier are required to be initialized.

                    • rv770 White Paper
                      MicahVillmow
                      Alexaverbuch,
                      Your algorithm choice will determine the answer to which memory is important. For example, simple_matmult does not use LDS or GDS and relies on the texture cache and outperforms many if not all matrix mul algorithms on the RV770 that attempt to use LDS. NLM_Denoise also outperforms the equivalent algorithm that uses LDS.

                      So it isn't necessarily GPGPU/graphics in general that determine what memory you use, but your problem domain and algorithmic choice that should drive the decision.
                      • rv770 White Paper
                        MicahVillmow
                        The slides should give you all of that information. If not please let me know.

                        Micah
                          • rv770 White Paper
                            alexaverbuch

                            It's not clear to me. Low level development is not my specialty. Also, I am viewing these slides in Open Office and many of the images seem to be poorly formatted... but mostly, I'm inexperienced with all of this still :-)

                            If you can clarify the questions please do.

                            Alex