2 Replies Latest reply on Nov 24, 2011 4:04 PM by notyou

    Explanation of clCreateBuffer Options

    notyou

      To begin, I understand that the flags CL_MEM_READ_WRITE, CL_MEM_WRITE_ONLY and CL_MEM_READ_ONLY create a buffer on the device itself and (if I choose) have the ability to copy host memory contents to the buffer on the device.

       

      What I need some clarification/reassurance on is the use of the following 3 options.

       

      1. CL_MEM_USE_HOST_PTR - as I understand this, the device will not create a buffer in its own memory and instead all memory accesses are performed via system memory (for which there must be a buffer already in use) (for the GPU, this would involve writing over PCIe to system memory, which I understand will be slow). Is this correct?

      2. CL_MEM_ALLOC_HOST_PTR - this should create a buffer in main memory for the device to access (similar to 1?), but doesn't require a host pointer. Again, is my understanding correct? Why would we use this?

      3. CL_MEM_COPY_HOST_PTR - creates a buffer on the device and copies the data from main memory into the device's memory. Why would we use this?

      If you could shed some light on this it would be much appreciated. Also, if you could provide an example of the use of these 3 and why we would choose one over the other, that would be particularly helpful. Thanks.

      -Matt



        • Explanation of clCreateBuffer Options
          nou

          any of this flags mean that memory buffer will not be allocated in device memory. i recomend read AMD OpenCL programing guide where is explained detaily which flag what mean.

            • Explanation of clCreateBuffer Options
              notyou

              I did read the programming guide, but the way it explains each of the flags makes me question their use. To expand on my question above, I would assume that using these 3 flags would make overall memory access slower than using a local (or global) buffer on the device itself (but may be useful for the APU), but may be useful if we only access elements rarely (such that the overhead of copying the original memory to the device would be more than the occasional memory access. e.g. copying 1 million items to the device if we only randomly read 10 or something). Is this reasoning correct?