8 Replies Latest reply on Oct 11, 2010 4:41 PM by himanshu.gautam

    Global memory caching

    mindsporter

      What is the global memory caching policy on the ATI Stream platform? clCreateBuffer() does not take a device argument. This would imply either that calling clCreateBuffer() causes *all* OpenCL devices to cache the corresponding memory, or that any caching of memory only happens on specific devices when the kernel using that memory is enqueued to run on the devices. Which of these represents the actual scenario?

      Thanks,

      Albert.

        • Global memory caching
          himanshu.gautam

          Hi mindsporter,

          The second scenario is correct.Memory buffers are created only if they are really needed by the kernel executing.

            • Global memory caching
              mindsporter

              Thanks Himanshu. That leads to my next question. Is the buffering eager or lazy, i.e, is all memory buffered at the start or is buffering done piecewise on an as-needed basis? In either case, what happens if the device memory is already full?

                • Global memory caching
                  karbous

                  E.g. when you call clEnqueueWriteBuffer the transaction is enqueued to the command queue, not just a part of the data. If memory is already full, you get CL_OUT_OF_RESOURCES error. See OpenCL spec, there is also mentioned what errors can give api function calls.

                    • Global memory caching
                      himanshu.gautam

                      mindsporter,

                      buffers are created eagerly as you say it.If any element of buffer is needed the whole buffer is transferred to the global memory of gpu in one go.

                        • Global memory caching
                          mindsporter

                          Thanks to both of you. Karbous, you are right about the CL_OUT_OF_RESOURCES error in case of clEnqueueWriteBuffer() as specified in the spec. But my question was more platform specific, imagining maybe there is some kind of a buffer eviction policy that exists on the ATI Stream platform that on seeing insufficient device memory would take the least recently used buffer out of device memory and into system memory (a bit like how CPU caches work). Is there some such mechanism? Or do buffers on device memory keep occupying device memory till they are released?

                            • Global memory caching
                              karbous

                              There isn't such mechanism, it is up to the programmer to check how much memory the card has and choose how to divide work. You've got clGetDeviceInfo function to retrieve information about the OpenCL device (how much memory it has, what supports and so on).

                                • Global memory caching
                                  nou

                                  but what if run kernel which take for example 400MB buffer and then run another or same kernel which are take another 400MB buffer will implementation swap this buffer from to GPU memory?

                                  i cn understand restriction that buffers needed to execute kernel must fit into device memory. but will it swap?

                                  appereantly AMD remove that restriction that all allocated buffer must fit in device memory.