11 Replies Latest reply on Oct 5, 2010 1:05 PM by MicahVillmow

    allocate global memory

    Meteorhead

      Dear all,

      I'm sure someone knows the answer to this very simple question, but I have not found the solution, no matter how much I googled.

      What is the way of allocating memory on device without using buffers? I tried the same way as one would allocate local memory, when setting kernel arguments, I specify a size and pass a NULL pointer. Then inside kernel, the specified kernel argument pointer which is __local appended will be allocated local memory. The same thing does not work if the kernel argument is __global appended.

      What is the correct way of doing this?

        • allocate global memory
          karbous

          What's wrong with clCreateBuffer? Unless you call clEnqueueWriteBuffer, there won't be any memory transaction between host and device and you will have an array (or buffer) to store data in the kernel.

          What I understand from the OpenCL spec (pdf 1.1 rev 33) you can't allocate __global memory inside a kernel. See table 3.1 on page 27 section 3.3 Memory model

            • allocate global memory
              Meteorhead

              I do not want to allocate global memory inside a kernel. It's just that the code doesn't seperate well, if buffer creation mixes with simple memory allocation on device. Usually one creates all buffers for transfers in one place, and one might think it's possible to do all the allocations at the setKernelArgs point. I don't see the point why is there a distinction in the way local and global are allocated.

              I know it works with buffers, but I wanted to know if this is the only way.

              Edit: The one thing that bugs me most, is that if you create a buffer, you have to specify a pointer in host memory. It's not nice to pass a pointer to a buffer that has nothing to do with, because that buffer will not be used for transfers.

                • allocate global memory
                  karbous

                  I doubt there is a way allocating global memory through setKernelArgs.

                  Sorry, I'm not the right person to answer why there is the difference in allocating between local and global memory ;-) Maybe someone cleverer than me can answer your question.

                    • allocate global memory
                      nou

                      you can pass NULL pointer to clCreateBuffer(). just pass CL_MEM_WRITE_ONLY flag and you just alocate buffer on device.

                        • allocate global memory
                          Illusio

                          In addition to nou's post, I suppose it might be worth checking if your code is checking return values from clCreateBuffer, because from the way you're describing your code as "having to pass a pointer", it sounds like you have one of the flags CL_MEM_USE_HOST_PTR or CL_MEM_COPY_HOST_PTR set. Both of those require a host pointer, however, when these flags are not set, the host_ptr argument must be NULL, and clCreateBuffer should fail(According to the spec) if you pass something else.

                           

                            • allocate global memory
                              himanshu.gautam

                               

                              hi meteorhead,

                              They are different pieces of hardware. Local memory is not directly accessible by the host, so it's allocation is just a reservation of size. Global memory is directly allocatable by the host so it's a reservation of size plus an initialization of data.

                    • allocate global memory
                      Meteorhead

                      Thank you for the replies. So I conclude there is no way to allocate global memory without creating buffers, becuase that is the only way. If one does not wish to pass unwanted pointers for a buffer, because no transfers will be made or the memory does not need to be initialized, than not setting HOST_PTR flags will do the trick, keeping the proper usage of the READ/WRITE type flags.

                      Although it might be a different topic, but let me ask it here (if the flags were mentioned): could someone summarize what optimizations are done by setting the proper READ_ONLY, WRITE_ONLY, READ_WRITE flags? Yet again, a long long time ago, in a galaxy far far away I read that READ_ONLY tells the compiler that the data can be cached for reading. I never found a clear statement about telling the compiler to put something into constant cache for eg.

                      So if someone knows the tricks to the flags and telling the compiler to use specific memory caches, I'd be most glad. (and other people too, I'm sure)

                      • allocate global memory
                        MicahVillmow
                        The only way to get cached memory in SDK 2.2 is to mark a pointer as __constant or use Images.
                        • allocate global memory
                          Meteorhead

                          So am I correct, that the compiler (at the moment) does not read code ahead in a manner to be able to cache reads from __global memory between each mem_fence(GLOBAL). If I am not mistaken, __global writes inside kernels are cached as long as it does not overflow the write cache available to all Compute Units. My question would involve intimacies such as does it matter to specify something WRITE_ONLY or READ_ONLY, or is it only a language capability for future compiler optimizations?

                          I can image WRITE_ONLY buffers will be written using the write cache, but if the buffer can also be read, cacheing might imply synchronization taken if a work item wants to access data in another Compute Unit's write cache, which has not yet been written into __global.

                          So my question remains: it would be good to know exactly what the point is of setting R/W_ONLY flags, what optimizations will be done by setting these buffer flags?

                          Micah's answer seemed to show the point of this being highly SDK version (compiler version to be more exact) dependant. If the answer is only for SDK v2.2 it would still be nice. If someone could even give a sneak-peak as to what improvements can be expected for future realeases, that would rock.

                            • allocate global memory
                              nou

                              my point of view to READ/WRITE_FLAG is that when you specify READ only then implementation do not need synchronize buffer across multiple devices in context. if you only read then implementation can assume that in kernels you do not change content of buffer. so it do not need propagate changes to other devices memory.

                            • allocate global memory
                              MicahVillmow
                              The READ_ONLY_MEM and WRITE_ONLY_MEM flags that are specified during cl_mem object creation time have no connection to caching during compilation. We are constantly working on improving both our runtime and compiler stacks, but cannot give any specifics on the improvements for the next release at this time.