Hi folks,
I am using my A8-3850 for OpenCL programming. It seemed GPU returns the control to CPU if not using a clFinish(command_queue) after clEnqueueNDRangeKernel(). I am just wondering that can the CPU part of APU work concurrently with a running kernel on 6550D of the same APU?
Thanks in advance.
Solved! Go to Solution.
clEnqueueNDRangeKernel(kernelA);
clFlush(command_queue);//start execution of kernel
functionB();//your code on CPU
clFinish(command_queue);//ensure that kernel finished or wait for it.
Hi,
I don't think there should be any problem in that. Current APUs as i know have separate areas of RAM reserved for them, and a copy is used when buffers are to be transferred from APU's RAM to GPU's RAM area.
It would be interesting to know, what problem you are working on. I am specially interested in how you are doing load balancing.
Suppose I have GPU kernelA (0.4 sec) and CPU functionB (0.3sec), I tried:
(1) clEnqueueNDRangeKernel(kernelA);
functionB();
total time: 0.3 sec //actually, kernelA didn't execute
(2) clEnqueueNDRangeKernel(kernelA);
clFinsih(command_queue);
functionB();
total time: 0.7 sec
(3) clEnqueueNDRangeKernel(kernelA);
clEnqueueReadBuffer(); //like nou mentioned, I think it's implicit clFlush()
functionB();
total time: 0.7 sec // ignored short readbuffer time
No matter functionB is accessing its private memory or not, the 3 cases are the same. I would like to know whether APU allows its CPU and GPU to concurrently execute which means I can get 0.3 sec.
clEnqueueNDRangeKernel(kernelA);
clFlush(command_queue);//start execution of kernel
functionB();//your code on CPU
clFinish(command_queue);//ensure that kernel finished or wait for it.
OpenCL implementation from AMD is lazy. that mean it doesn't start execution of enqueued operation until you call clFlush() or other method which call it implicitly.
Are you really shure?
I never use clFlush or clFinish, and in my situation it it so, that i read the data from the GPU, when the calculation is finished with a blocking read. So the System waits until the kernel is finished, and the read is done.
So if you use only one queue, there is no need for the two functions. As far i could say.
A blocking operation triggers a flush anyway.
I already seen this question when I had search a good library "cassoulet.h". I think you can to find it easily with google or yahoo. But, 😕 I didn't find it now. Good luck and don't forgot "cassoulet.h" or "ravioli.h"
I don't remember good luck