cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

pavandsp
Adept I

multiple kernel buffer issue

Hi

I have split my big algorithm into 2 kernels kernel[0] and kernel[1];

kernel[0] updates outputbuffer(1280x720) and i read the outputbuffer by clEnqueueReadBuffer to "hostoutput"(1280x720).

Now I modify the "hostoutput" in application side then I want to send the modified "hostoutput" to kernel[1].here i just set kernel[1] arg with outputbuffer.

doubt:

1.should i create a new buffer object by clCreateBuffer again for kernel[1] with hostoutput....flags is set with CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,

I thought there will be cache coherency between "hostoutput" and opencl device outputbuffer always till buffer release.

 

0 Likes
6 Replies
nou
Exemplar

http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=129436&enterthread=y

0 Likes

I tried using clEnqueueMapBuffer but i couldn't succeed in the getting the output.infact i am getting NULL output.

could you point to any examples using the above scenario(multiple kernels ,reusing the output buffer of kernel[0]  as input buffer of kernel[1] and ofcourse some modifications in between kerne[0] and kernel[1] execution ).

Hope you understood what i see.

0 Likes

Hi Nou and All ,

Please help me in getting an example of above scenario.

I tried with clEnqueueMapBuffer and also with creating a new buffers for each kernel w/o clEnqueueMapBuffer...seems i am not getting  the exact nerve of the buffer management.

Thanks in advance

Pavan

0 Likes

Are you doing something like this-

1. clCreateBuffer with CL_MEM_READ_WRITE

2. Call kernel[0]

3. clEnqueueMapBuffer

4. Modify data

5. clEnqueueUnMapBuffer

6. call kernel[1]

 

This should work without any problem.

0 Likes

Scenario: I want outputbuffer of kernel[0] modified in host application then send the modified buffer as input to kernel[1].

In kernel[1] modify the input buffer and store the results in same buffer.

I am doing something like this.

1. outputBuffer = clCreateBuffer(context,CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,sizeof(cl_uint) * width  * 3 , output, &status)

2.output =(cl_uint *)clEnqueueMapBuffer(commandQueue,outputBuffer , CL_TRUE,CL_MAP_READ|CL_MAP_WRITE,0, sizeof(cl_uint) * width  * 3 ,
 0,NULL, &events[0],&status); /

//map the output host memory with device outputbuffer so that i can use device "outputbuffer" as input to kernel[1].

3.clSetKernelArg(kernel[0], 4,sizeof(cl_mem),(void *)&outputBuffer);

4.Call kernel[0].....clEnqueueNDRangeKernel.

5.clEnqueueReadBuffer(commandQueue,outputBuffer, CL_TRUE, 0, width * 3 * sizeof(cl_uint),output, 0,NULL,&events[1]);

6.Modiy the "output" in host application.

7.clSetKernelArg(kernel[1],0, sizeof(cl_mem),(void *)&outputBuffer); //here i am setting the same outputbuffer as input to kernel[1]....wihtout creating new bufferobject...assuming host buffer"output" and devicebuffer "outputBuffer" are in sync.

8.Call kernel[1].....clEnqueueNDRangeKernel.

9.Then clEnqueueReadBuffer(commandQueue,outputBuffer,CL_TRUE,0,width * 3 * sizeof(cl_uint),output,0,NULL, &events[1]);

...Please rectify for any wrong flow.

 

 

0 Likes

In clEnqueueReadBuffer, you are trying to copy data from the pointer output to the same pointer location.

I would suggest you to use map/unmap instaed of clEnqueueReadBuffer. As, your buffer is on host itself, it should be faster to use map/unmap.

Also, you have not unmapped the pointer after second step, you must unmap before launching kernel.

0 Likes