12 Replies Latest reply on May 13, 2010 1:52 PM by Raistmer

    Why on HD4870 with 512 MB onboard RAM only 128 available to OpenCL ???

    Raistmer
      Global memory size: 134217728
      Constant buffer size: 65536
      Max number of constant args: 8

      Only 1/4 of total RAM can be used in OpenCL buffers, why?!
        • Why on HD4870 with 512 MB onboard RAM only 128 available to OpenCL ???
          MicahVillmow
          Raistmer,
          The default values for OpenCL were chosen to provide a large amount of memory to the OpenCL application in a reliable and high performance fashion with our current implementation. They are there so that OpenCL plays nice with other applications that use the graphics card. This will be improved in future releases. That being said, there are environment variables that can be used to override the default values as specified in this posting here:
          http://forums.amd.com/forum/me...adid=128237&forumid=9

          If you set GPU_MAX_HEAP_SIZE, you can override the default value. If you set GPU_INITIAL_HEAP_SIZE to equal GPU_MAX_HEAP_SIZE and the initial allocation succeeds then you get guaranteed access to that much memory in your OpenCL program. If the two values are not equal, then it is possible that you can fail allocation sometime in your program as the heap gets resized. Also, you can have performance issues if the heap no longer fits in device memory and instead uses host memory.

          However, playing around with these environment variables is unsupported and you use them at your own risk. They are not future proof and could disappear in a future release so I strongly recommend using them only for testing and not basing an application on them working.
            • Why on HD4870 with 512 MB onboard RAM only 128 available to OpenCL ???
              bubu

               

              Originally posted by: MicahVillmow The default values for OpenCL were chosen to provide a large amount of memory to the OpenCL application in a reliable and high performance fashion with our current implementation.


              That 128Mb-hard-coded limit can be very problematic...

              Example:

              Imagine Photoshop wanna use OpenCL to perform a filter on a 4k x 4k x 4 channels x float buffer = 256Mb

              Now Imagine also an artist has already open a 3dsmax with a GPGPU renderer like VRayRT with a medium-polygonised model ( let's say a 2M sculpted quad model = 192Mb ). The artist wanna apply the filter to the texture in PS and then previsualize it in VRayRT without closing any application. A pretty common case btw.

               

              With the current 128Mb policy BOTH applications will fail !That means the artists won't be able to use VRayRT and NEITHER Photoshop.

              On the contrary, if you remove the 128Mb limitation almost ONE could run... and the other could just and plain show a "Sorry, out of VRAM memory. Please close any other GPGPU app so some memory is released".

              What do you think it's better? To execute none of the apps or to run one? I bet to run one.

               

              Pls, allow use to monopolise the VRAM. The users actually know what are they doing/using. OpenCL itself not.

               

              And btw... is the VRAM virtualised like the system memory in x86's protected mode?

            • Why on HD4870 with 512 MB onboard RAM only 128 available to OpenCL ???
              Raistmer
              I see, thanks for explanation.
              Hope next release will enable more GPU memory by default, for now I can live with 128MB only, just was surprised with allocation errors for sizes far less than 256MB...
                • Why on HD4870 with 512 MB onboard RAM only 128 available to OpenCL ???
                  blelump

                  Perhaps it does not strictly fit this topic, but is there any possibility to measure how many memory can be allocated on specified GPU ? It is obvious that OS needs to allocate some memory and perhaps a few other apps.

                  Thanks to this topic I know that default (max) memory size might be overridden, but what is actually the max value that will not crash my X window system or how it could be measured?

                    • Why on HD4870 with 512 MB onboard RAM only 128 available to OpenCL ???
                      gaurav.garg

                      Exceeding memory requirements shouls never crash X. But, clCreateBuffer should give an error saying it is out of memory.

                      You can query maximum amount of memory available using CL_DEVICE_GLOBAL_MEM_SIZE enum in clGetDeviceInfo.

                        • Why on HD4870 with 512 MB onboard RAM only 128 available to OpenCL ???
                          nou

                          i use this OpenGL extension http://www.opengl.org/registry/specs/ATI/meminfo.txt to create small program that monitor free memory on GPU.

                          it return four value in kB.

                          in my experiment with GPU_MAX_HEAP_SIZE i increase to 1024MB. then as i allocate 128MB buffers it repeatly move from memory pool to auxiliary memory and back. maximum what it allocate on my 1GB 5850 was 768MB then it return out of resources.

                            • Why on HD4870 with 512 MB onboard RAM only 128 available to OpenCL ???
                              blelump

                              Can anyone explain it to me:

                              There is on Linux a file called mem located in /proc/ati/0 (that is probably system specific). It shows some useful information which I cannot fully understand. It looks like below:

                               

                              # cat /proc/ati/0/mem
                                            total counts            |    outstanding
                              type           alloc  fail      bytes | allocs      bytes
                              system             0     0 2123517952 |      0          0
                              locked        354349     0 1455151528 |   8748   39565736
                              sareas             1     0       8192 |      1       8192
                              driver         51579     0    7245854 |    118    4122674
                              magic              5     0         60 |      0          0
                              maplist         6529     0     391740 |     15        900
                              vmalist         6651     0      79812 |     16        192
                              buflist           40     0     117968 |     16     116432
                              files             11     0       7552 |      3       1536
                              contexts           3     0         56 |      1         32
                              hwcontext          5     0      40960 |      4      32768

                               

                              # cat /proc/ati/0/mem1
                                            total counts            |    outstanding
                              type           alloc  fail      bytes | allocs      bytes
                              mappings           5     0    2224128 |      5    2224128
                              textures           0     0          0 |      0          0
                              agplist            0     0          0 |      0          0
                              agpmem             0     0          0 |      0          0
                              boundagp           0     0          0 |      0          0
                              aperture           0     0          0 |      0          0
                              dmabufs            0     0          0 |      0          0
                              memlocks           0     0          0 |      0          0
                              mutex             65     0       1040 |      4         64
                              drawables          1     0         16 |      0          0
                              mempages      815121     0 3338735616 |   8721   35721216
                              pcielist         178     0    6554444 |      9      69804
                              pcie               0     0          0 |      0          0

                              Im asking cause while trying to set more than 255 for both, GPU_INITIAL_HEAP_SIZE and GPU_MAX_HEAP_SIZE, the kernel simply fails. I know these flags are unsupported currently but why 255 is the max value whereas card memory is 512?. What is the rest of memory spent for ?

                              I thought the Xserver or even session manager might be the problem. I've tested many configurations such as kdm or slim with kde, xfce, twm. It might seem that the most lightweight configuration is simply slim+twm. Well, perhaps it is, but it doesn't help in this case. Does anyone have an idea why is that ?

                              When just GPU_MAX_HEAP_SIZE is bigger than 255, it works but probably uses host memory, which dramatically decreases computation performance. Actually it is just slightly faster than on CPU :-( .

                      • Why on HD4870 with 512 MB onboard RAM only 128 available to OpenCL ???
                        Raistmer
                        Hm, actually I don't wonder much why that hackish env variable doesn't work in new SDK. I wonder why new SDK still have limit of 128MB for memory usage for 4870 GPU!
                        • Why on HD4870 with 512 MB onboard RAM only 128 available to OpenCL ???
                          Raistmer
                          Hm, this feature can be viewed as memory virtulization. One would swap memory object out of device memory into host memory (and, ultimately, even into swap file). It's good for memory-demanding apps that algorithm can not tolerate lower memory amount.
                          But for other apps it will be surely performance killer if such swapping will be done on runtime discretion w/o any control from app author. In this case maybe some additional attribute will be needed just as page-locking for Windows VirtualAlloc. Some data should stay in GPU memory while other could be swapped out of it. It becomes especially important when few GPGPU apps executed simultaneously. Improper memory swapping can kill performance for all of them.