9 Replies Latest reply on Jul 7, 2014 5:26 AM by jammyamerica

    Large buffers


      Question: Are there any plans to support large buffers (close to 100% of available memory) on AMD graphics cards? If not, why not?


      This "feature" (as AMD likes to refer to common sense behaviour lacking in their products) is vital to many scientific computing applications including my area of interest, computational fluid dynamics (CFD). The "feature" is "available" on nVIDIA hardware (CUDA and OpenCL) and of course on CPUs and indeed on every other computational device I am aware of.


      Currently OpenCL users are limited to 25% of device memory, the minimum required by the OpenCL specification.


      Although I believe CAL is not subject to this limit, it appears to be either becoming deprecated or simply broken:



      It appears that there used to be an undocumented environmental variable GPU_MAX_ALLOC_SIZE that allowed users to increase this limit however it has disappeared as silently as it appeared.

        • Re: Large buffers

          Could you try setting GPU_MAX_ALLOC_PERCENT=90 ?

          The default value is 50. 

          Please note that this is for experimentation purposes and changing the default value may lead to unexpected behavior.

            • Re: Large buffers

              Thanks for the helpful reply Siu, I am very happy that this environmental variable appears to work on my system, at least up to 60% of total device memory. I have no means of testing if it works up to 100% since my device (HD6950) does not have "full framebuffer suport" meaning it can only access 60% of its memory



              Can you or any other AMD staff comment on whether large buffers will be officially supported in the future? Thanks again for your help.

            • Re: Large buffers

              Hi vanja_z, welcome to the forum,


                Currently OpenCL users are limited to 25% of device memory,


              I don't know where you get this from, perhaps it's a rumor, but it's certainly not correct.

              (there is a 512MB limit per allocation call but you can allocate as much as you like)


              I do predominately scientific computing and often need very large and fast memory so I am mostly using the 7970. On the 7970, I often allocate a single contiguous buffer that uses just shy of 3GB, the device limit. It's very simple, all you do is allocate in chunks of 512MB or less and make sure the chunks are rounded to about 0x4000 bytes, then they will be placed contiguously. Example, allocating 2GB you might have kernel buffers like


              __kernel(global float *A, global float *B, global float *C, global float *D){}


              Since this is C language and A,B,C,D are memory pointers, you can use A to reference all of memory.

              Here is a printout from a typical program start:


              open:devices 3 gpus, 1 cpu, device(0) = Tahiti

              start(cl):ndevs=3 gpus=1 time=57.136

              <readback of actual allocation map>

              buffer 0 start 01D1E000 to 21D1E000 size=20000000  Gap = 00000

              buffer 1 start 21D1E000 to 41D1E000 size=20000000  Gap = 00000

              buffer 2 start 41D1E000 to 61D1E000 size=20000000  Gap = 00000

              buffer 3 start 61D1E000 to 81D1E000 size=20000000  Gap = 00000

              buffer 4 start 81D1E000 to A1D1E000 size=20000000  Gap = 00000

              buffer 5 start A1D1E000 to B0E1E000 size=0F100000  Gap = 00000

              buffer 6 start B0E1E000 to BF21E000 size=0E400000  Gap = 00000

              buffer 7 start BF21E000 to BFE1E000 size=00C00000  Gap = ----  (last address on GPU is BFFFFFFC)


              The last couple of buffers are different size for an unrelated reason. Note, I have not used GPU_MAX_ALLOC

              type parameters and have never seen a need to. This also works on Cayman, and Barts devices but I prefer

              Tahiti because the memory is so large and fast. Sorry, I don't know much about Nvida devices because I

              usually choose hardware based on specifications.


              Hope it helps.

              1 of 1 people found this helpful
              • Re: Large buffers



                Thanks for sharing information.