I tried to implement Neural Network algorithm by OpenCl. Theoretically, it needs several kernels to complete the algorithm, because each step, which is implemented in one kernel, needs different global-size. But this method is cost much time than implementing this algorithm in one big kernel which is combined by all of the first method's kernels.
So, I think it is becuse much time of data transmission between CPU and GPU consumed in excuting several little kernels than a big combined kernels, but I'm not sure of this.
Therefore, I want to know
1.Dose clCreateBuffer() means to allocate host memory or GPU memory for buffer ?
2.How to create buffer on GPU memory? Please in detail, thanks very much !
If I can create buffer on GPU memory ,data transmissioin time will consumed little and I can use several kernels rather than one combined as well.
clCreateBuffer creates a copy of the input buffer on host side itself.For creating a buffer on device side:
1.you need to creare buffer using clCreateBuffer.
2.Now using clEnqueueWriteBuffer(which requires the command queue as param) you can create another another copy of buffer on device side.
you can also use local memory to further reduce the fetchtime if applicable to your case.