6 Replies Latest reply on Mar 19, 2010 10:30 AM by gaurav.garg

    multiple kernel buffer issue



      I have split my big algorithm into 2 kernels kernel[0] and kernel[1];

      kernel[0] updates outputbuffer(1280x720) and i read the outputbuffer by clEnqueueReadBuffer to "hostoutput"(1280x720).

      Now I modify the "hostoutput" in application side then I want to send the modified "hostoutput" to kernel[1].here i just set kernel[1] arg with outputbuffer.


      1.should i create a new buffer object by clCreateBuffer again for kernel[1] with hostoutput....flags is set with CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,

      I thought there will be cache coherency between "hostoutput" and opencl device outputbuffer always till buffer release.


          • multiple kernel buffer issue

            I tried using clEnqueueMapBuffer but i couldn't succeed in the getting the output.infact i am getting NULL output.

            could you point to any examples using the above scenario(multiple kernels ,reusing the output buffer of kernel[0]  as input buffer of kernel[1] and ofcourse some modifications in between kerne[0] and kernel[1] execution ).

            Hope you understood what i see.

              • multiple kernel buffer issue

                Hi Nou and All ,

                Please help me in getting an example of above scenario.

                I tried with clEnqueueMapBuffer and also with creating a new buffers for each kernel w/o clEnqueueMapBuffer...seems i am not getting  the exact nerve of the buffer management.

                Thanks in advance


                  • multiple kernel buffer issue

                    Are you doing something like this-

                    1. clCreateBuffer with CL_MEM_READ_WRITE

                    2. Call kernel[0]

                    3. clEnqueueMapBuffer

                    4. Modify data

                    5. clEnqueueUnMapBuffer

                    6. call kernel[1]


                    This should work without any problem.

                      • multiple kernel buffer issue

                        Scenario: I want outputbuffer of kernel[0] modified in host application then send the modified buffer as input to kernel[1].

                        In kernel[1] modify the input buffer and store the results in same buffer.

                        I am doing something like this.

                        1. outputBuffer = clCreateBuffer(context,CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,sizeof(cl_uint) * width  * 3 , output, &status)

                        2.output =(cl_uint *)clEnqueueMapBuffer(commandQueue,outputBuffer , CL_TRUE,CL_MAP_READ|CL_MAP_WRITE,0, sizeof(cl_uint) * width  * 3 ,
                         0,NULL, &events[0],&status); /

                        //map the output host memory with device outputbuffer so that i can use device "outputbuffer" as input to kernel[1].

                        3.clSetKernelArg(kernel[0], 4,sizeof(cl_mem),(void *)&outputBuffer);

                        4.Call kernel[0].....clEnqueueNDRangeKernel.

                        5.clEnqueueReadBuffer(commandQueue,outputBuffer, CL_TRUE, 0, width * 3 * sizeof(cl_uint),output, 0,NULL,&events[1]);

                        6.Modiy the "output" in host application.

                        7.clSetKernelArg(kernel[1],0, sizeof(cl_mem),(void *)&outputBuffer); //here i am setting the same outputbuffer as input to kernel[1]....wihtout creating new bufferobject...assuming host buffer"output" and devicebuffer "outputBuffer" are in sync.

                        8.Call kernel[1].....clEnqueueNDRangeKernel.

                        9.Then clEnqueueReadBuffer(commandQueue,outputBuffer,CL_TRUE,0,width * 3 * sizeof(cl_uint),output,0,NULL, &events[1]);

                        ...Please rectify for any wrong flow.



                          • multiple kernel buffer issue

                            In clEnqueueReadBuffer, you are trying to copy data from the pointer output to the same pointer location.

                            I would suggest you to use map/unmap instaed of clEnqueueReadBuffer. As, your buffer is on host itself, it should be faster to use map/unmap.

                            Also, you have not unmapped the pointer after second step, you must unmap before launching kernel.