11 Replies Latest reply on Sep 10, 2010 9:58 PM by Raistmer

    How to create buffer on device memory?

    Fuxianjun

      I create several arrays on cpu and want to operate them on gpu, I got that since operation is on gpu, to access gpu memory is faster than host memory. So, how to create buffers of these arrays on gpu memory? can I use just only clCreateBuffer() function ? If so, how to choose flag? Thanks !

        • How to create buffer on device memory?
          Raistmer
          AFAIK there are few ways to do this
          1) create buffer and map it to cpu, fill with data, then map to GPU (for prev SDK version it was slower than others, didn't test with SDK 2.2)
          2) create buffer and use clEnqueueWriteBuffer to fill it with data.
          (for windows it will also take memory from host too, but data transfer was faster than in case 1) for prev SDKs at least)
          3) create buffer with copy memory flag. But this way is good if you need to update GPU buffer only once. If you need to update GPU buffer from host in loop - you will need case 2) again.
            • How to create buffer on device memory?
              Fuxianjun

               

              Originally posted by: Raistmer AFAIK there are few ways to do this 1) create buffer and map it to cpu, fill with data, then map to GPU (for prev SDK version it was slower than others, didn't test with SDK 2.2) 2) create buffer and use clEnqueueWriteBuffer to fill it with data. (for windows it will also take memory from host too, but data transfer was faster than in case 1) for prev SDKs at least) 3) create buffer with copy memory flag. But this way is good if you need to update GPU buffer only once. If you need to update GPU buffer from host in loop - you will need case 2) again.


              Thank you very much for reply,but I still can not understand, could you please explain again in detail ?

              For case 1, Dose map mean to use clEnqueueMapBuffer function ? 

              For case 2, I think it is the best way for my problem. Dose "create buffer" mean to use clCreateBuffer function ? If so, which flag is proper ? Is this buffer created on host memory or GPU memory ? If using clEnqueueWriteBuffer, is the ptr parameter ptr a pointer to host memory or GPU memory ?

                • How to create buffer on device memory?
                  himanshu.gautam

                  Fuxianjun,

                  use clEnqueueWriteBuffer instead,but you need to create buffer before using it.

                  In most cases,clcreatebuffer creates the buffer on host side,which is quite inefficient to access by GPU.

                    • How to create buffer on device memory?
                      jeff_golds

                       

                      Originally posted by: himanshu.gautam Fuxianjun,

                       

                      use clEnqueueWriteBuffer instead,but you need to create buffer before using it.

                       

                      In most cases,clcreatebuffer creates the buffer on host side,which is quite inefficient to access by GPU.

                       

                       

                      This is not correct.  clCreateBuffer() is always defaults to device memory.  You can add flags like CL_ALLOC_HOST_PTR if you prefer that the memory reside on the host, or at least be host accessible, rather than the device.  Host memory is generally slower for the device.

                      Jeff

                        • How to create buffer on device memory?
                          Fuxianjun

                           

                          Originally posted by: jeff_golds
                          Originally posted by: himanshu.gautam Fuxianjun,

                           

                          use clEnqueueWriteBuffer instead,but you need to create buffer before using it.

                           

                          In most cases,clcreatebuffer creates the buffer on host side,which is quite inefficient to access by GPU.

                           

                           

                          This is not correct.  clCreateBuffer() is always defaults to device memory.  You can add flags like CL_ALLOC_HOST_PTR if you prefer that the memory reside on the host, or at least be host accessible, rather than the device.  Host memory is generally slower for the device.

                          Jeff

                          God,which of you two is correct on earth ? Dose anyone tell me the truth ?

                           

                          • How to create buffer on device memory?
                            Raistmer
                            Originally posted by: jeff_golds

                            Originally posted by: himanshu.gautam Fuxianjun,


                             


                            use clEnqueueWriteBuffer instead,but you need to create buffer before using it.


                             


                            In most cases,clcreatebuffer creates the buffer on host side,which is quite inefficient to access by GPU.

                            This is not correct. clCreateBuffer() is always defaults to device memory. You can add flags like CL_ALLOC_HOST_PTR if you prefer that the memory reside on the host, or at least be host accessible, rather than the device. Host memory is generally slower for the device.

                            Jeff



                            Hehe, you based on OpenCL specs while Himanshu could reveal some details of current SDK implementation. From my own observations I see host memory increase more than enough to hold all allocated "on GPU" buffers. Hence at least buffer allocated on both GPU and host memories. But I can only hope that it's the case. Quite possibly it allocated only on host indeed




                              • How to create buffer on device memory?
                                jeff_golds

                                 

                                Originally posted by: RaistmerHehe, you based on OpenCL specs while Himanshu could reveal some details of current SDK implementation. From my own observations I see host memory increase more than enough to hold all allocated "on GPU" buffers. Hence at least buffer allocated on both GPU and host memories. But I can only hope that it's the case. Quite possibly it allocated only on host indeed


                                Actually, I work on the OpenCL runtime at AMD

                                Host memory increases due to the way we currently allocate transfer buffers, but the actual backing store is on the device.  Thus, for best device access performance, you should use clCreateBuffer().

                                If you want to create a buffer that can be quickly updated by the CPU, use CL_MEM_ALLOC_HOST_PTR.  You can use that data directly with the device, or, if accessing over PCIe is a bottleneck, you can use clCopyBuffer to copy the data to a buffer on the device.  This path will be more optimal soon.

                                Jeff

                      • How to create buffer on device memory?
                        Raistmer
                        LoL
                        And what about pinned memory ? Still not implemented ?
                        And mapping buffer - will it copy data to temporary GPU memory buffer when unmapped by host ? For example, my data path is to prepare some data into host memory, then put it to GPU, then does various transformations in kernels (each kernels takes buffer from prev one and sometimes modifies same buffer, sometimes writes into new one) and then (and only if some flag setted) transfer data back to host memory.
                        That is, duplicating buffers in host memory is waste of resourses in my case(some of then never used on host side at all), especially if not only memory allocated but data transferred between 2 buffer realisations too.
                        What the best way to implement such buffer usage in current implementation?