12 Replies Latest reply on Aug 16, 2013 2:12 AM by himanshu.gautam

    Maximum memory allocation problem

    vanja_z

      There appears to be a bug limiting the maximum memory allocation to around 60% of device memory. According to this knowledge base article:

      http://developer.amd.com/support/KnowledgeBase/Lists/KnowledgeBase/DispForm.aspx?ID=123

      by default, the memory made available to OpenCL is limited to 50% and it should be possible to increase this to 100% by setting the environmental variable GPU_MAX_HEAP_SIZE to a value between 0 and 100. In my tests this has not worked as expected. The amount of memory reported using CL_DEVICE_GLOBAL_MEM_SIZE does match the expected value however the actual amount of memory able to be allocated does not.

       

      I have tested this by allocating and initializing small buffers (20MB) until failure. Regardless of settings, the actual amount of memory available tops out at around 60%. I have included the code used for testing and would be interested to hear if other peoples installations behave similarly. For your reference, the Nvidia implementation does not suffer from this problem and using their hardware/driver I can allocate very close to the total device memory (even in a single buffer I might add).

      Here are the results on my machine with specs:


      2 x HD6950 2GB
      Arch Linux Driver 11.12
      SDK 2.6


      DEFAULT

      Global memory size: 1073 MB
      Accessed 1080 MB

      GPU_MAX_HEAP_SIZE=55
      Global memory size: 1180 MB
      Accessed 1180 MB

      GPU_MAX_HEAP_SIZE=60
      Global memory size: 1287 MB
      Accessed 1260 MB

      GPU_MAX_HEAP_SIZE=70
      Global memory size: 1502 MB
      Accessed 1260 MB

      GPU_MAX_HEAP_SIZE=100
      Global memory size: 2147 MB
      Accessed 1260 MB

       

      Looking forward to hearing anyone elses experience or an official response.

      Vanja

       

      /*
       * vzcl_maxalloc.c
       * 
       * Copyright 2012 Vanja Zecevic
       * 
       * This program is free software: you can redistribute it and/or modify
       * it under the terms of the GNU General Public License as published by
       * the Free Software Foundation, either version 3 of the License, or
       * (at your option) any later version.
       *
       * This program is distributed in the hope that it will be useful,
       * but WITHOUT ANY WARRANTY; without even the implied warranty of
       * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
       * GNU General Public License for more details.
       *
       * You should have received a copy of the GNU General Public License
       * along with this program.  If not, see <http://www.gnu.org/licenses/>.
       *
       */
      
      #include <stdio.h>
      #include <CL/cl.h>
      
      int main (int argc, char * argv[])
      {
      int chunk = 10;
      int maxmem = 2000;
      int iChunk;
      int nChunk;
      int iX;
      int iArg;
      int nX;
      int nAccessed;
      
      cl_platform_id platform;
      cl_device_id device;
      cl_context Context;
      cl_command_queue CmdQueue;
      cl_int err_tr = CL_SUCCESS;
      cl_ulong global_size;
      
      cl_mem * buffers_dev;
      int ** buffers_host;
      
      /*----------------------------------------------------------------------------*/
      /* Get flags.  */
      if (argc <=6 ) {
          for (iArg=1; iArg<argc; iArg++) {
              if (!strcmp(argv[iArg],"--help")) {
                  printf(
        "\nUSAGE:\n"
        "vzcl_maxalloc <flags>\n"
        "prints the ammount of available memory on an OpenCL device\n"
        "reported by clGetDeviceInfo and also the actual ammount able to be\n"
        "accessed.\n"
        "\n"
        "EXAMPLE:\n"
        "vzcl_maxalloc --chunk 10 --maxmem 2000\n"
        "\n"
        "FLAGS:\n"
        "--help   Prints this message\n"
        "--chunk  The size of each chunk to be allocated in MB (default 10 MB)\n"
        "--maxmem The maximum memory to allocate in MB (default 2000 MB)\n"
                    );
                  exit(1);
              }
              else if (!strcmp(argv[iArg],"--chunk"))  chunk  = atoi(argv[iArg+1]); 
              else if (!strcmp(argv[iArg],"--maxmem")) maxmem = atoi(argv[iArg+1]);
          }
      }
      nChunk = maxmem/chunk;
      nX = (chunk*(int)1e6)/sizeof(int);
      
      /*----------------------------------------------------------------------------*/
      /* Initialize OpenCL devices.  */
      err_tr = clGetPlatformIDs(1, &platform, NULL);
      clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
      Context = clCreateContext(NULL, 1, &device, NULL, NULL, NULL);
      CmdQueue = clCreateCommandQueue(Context, device, 0, NULL);
      
      clGetDeviceInfo(device, CL_DEVICE_GLOBAL_MEM_SIZE, sizeof(cl_ulong),
        &global_size, NULL);
      printf("Created OpenCL context.\n"
             "Global memory size: %li MB\n",
             global_size/(int)1e6);
      
      /*----------------------------------------------------------------------------*/
      /* First allocate buffers.  */
      buffers_dev = (cl_mem*)malloc(nChunk*sizeof(cl_mem));
      for (iChunk=0; iChunk<nChunk; iChunk++) {
          *(buffers_dev+iChunk) = clCreateBuffer(Context, CL_MEM_READ_WRITE,
            nX*sizeof(int), NULL, &err_tr);
          if (err_tr != CL_SUCCESS) {
              /*printf("error %i\n", err_tr);*/
              break;
          }
      }
      printf("Allocated %i MB\n", iChunk*nX*sizeof(int)/(int)1e6);
      nChunk = iChunk;
      
      /*----------------------------------------------------------------------------*/
      /* Now try to access buffers.  */
      buffers_host = (int**)malloc(nChunk*sizeof(int*));
      for (iChunk=0; iChunk<nChunk; iChunk++) {
          *(buffers_host+iChunk) = (int*)malloc(nX*sizeof(int));
          for (iX=0; iX<nX; iX++) *(*(buffers_host+iChunk)+iX) = 0;
          err_tr = clEnqueueWriteBuffer(CmdQueue, *(buffers_dev+iChunk), CL_TRUE, 0,
            nX*sizeof(int), *(buffers_host+iChunk), 0, NULL, NULL);
          if (err_tr != CL_SUCCESS) {
              /*printf("error %i\n", err_tr);*/
              break;
          }
      }
      printf("Accessed %i MB\n", iChunk*nX*sizeof(int)/(int)1e6);
      nAccessed = iChunk;
      
      for (iChunk=0; iChunk<nAccessed; iChunk++)
        free(*(buffers_host+iChunk));
      for (iChunk=0; iChunk<nChunk; iChunk++)
        clReleaseMemObject(*(buffers_dev+iChunk));
      free(buffers_host);
      free(buffers_dev);
      clReleaseCommandQueue(CmdQueue);
      clReleaseContext(Context);
      
      return 0;
      }
      

       

      Message was edited by: vanja z (fixed formatting for new forum)

        • Maximum memory allocation problem
          nou

          i experiment with this in the past (SDK 2.3-2.4).

          i could allocate 700MB on my 1GB card. and even then it had some quirks. like write rate into first 350MB was around 100GB/s. into second one it was 5GB/s. it just drop to 5GB/s after i cross original limit of memory. i suspect that buffers beyond original 50% limit was alocated in main memory and accesed viac PCIe bus.

            • Maximum memory allocation problem
              drallan

              I have a 2GB  Cayman running SDK 2.6 with driver 11.12, Windows 7 x64 and see similar issues.

              Although the maximum single buffer size is 0x20000000 bytes, it seems I can allocate as much memory as I want, at least up to about  3.5GB in chunks of 0x20000000.

              Just after 1.5GB, the memory speed drops way down to about 8GB/second so I assume it begins using main memory. I was using large buffers so it's hard to know where this is triggered. I would assume that if the trailing end of a buffer goes over a limit, then the entire buffer gets put to main memory?

               

               

                • Maximum memory allocation problem
                  nou

                  yes you can allocate as much buffers as you want. you must use them on some device. amd is using defered allocation when buffers are alocated when you used them on some device.

                  you allocate 7 buffers each 512MB. then you run kernel on them. first three fits into device memory so you can see full speed. last one doesn't as some GPU memory is ocupied by framebuffer. (try allocate slightly smaller buffers like 480MB and it should fit four off them)

                  you need mem object migration http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clEnqueueMigrateMemObjects.html in opnecl 1.2. with this you can move buffer from/into device memory and swap them as you need.

                    • Maximum memory allocation problem
                      vanja_z

                      Thank you for your replies,

                      It seems this is not a problem with my system.

                      nou: Using the code attatched, I can only access 2 x 480 MB buffers.

                      My situation is that I have developed a CFD code for a research project using CUDA and have recently attempted to port it to OpenCL for use with AMD hardware. Unfortunately this problem basically means I wasted my time and effort and prevents me from using AMD.

                      For high performance computing, I dare say it is an absolute necessity to be able to access the majority of device memory at full speed. Using an Nvidia card with 896 MB of memory I can create an 880MB single buffer, have it allocated on the device and access it with full bandwidth.

                      Is there any plans to fix this behaviour or is it considered to be 'desired'?

                • Re: Maximum memory allocation problem
                  vanja_z

                  I'm bumping this because its still broken on 12.1 drivers. Can we get a response from someone at AMD about if this is going to be fixed? Only being able to use 60% of your device memory is an absolute dealbreaker for me.

                  • Re: Maximum memory allocation problem
                    vanja_z

                    Does anyone who works at AMD post on these forums?

                      • Re: Maximum memory allocation problem
                        jeff_golds

                        Currently only HD77xx, and up, have full framebuffer support under Linux.  You can tell if your card supports full framebuffer support by whether "VM" appears in the version in clinfo.  Such as "Driver version:                                CAL 1.4.1714 (VM)".

                        1 of 1 people found this helpful
                          • Re: Maximum memory allocation problem
                            vanja_z
                            1. This is appalling, are there any plans to fix this?
                            2. Is this documented anywhere? It is a reasonable expectation that you can use all of the memory present on a device. Not having this documented amounts to false advertising. I was very close to purchasing a bunch of AMD cards for my project and am considering myself very lucky to have stumbled across this problem before I spent any money, others might not be so lucky.
                            3. Are there any plans to make this match what is reported by CL_DEVICE_GLOBAL_MEM_SIZE?
                        • Re: Maximum memory allocation problem
                          roboto

                          Have you tried setting GPU_MAX_ALLOC_PERCENT to 100? In Windows, I had to reboot for changes to take effect but I'm not sure about Linux. Also, GPU_MAX_HEAP_SIZE, does not work for me... deprecated may be?