I tried to implement Neural Network algorithm by OpenCl. Theoretically, it needs several kernels to complete the algorithm, because each step, which is implemented in one kernel, needs different global-size. But this method is cost much time than implementing this algorithm in one big kernel which is combined by all of the first method's kernels.
So, I think it is becuse much time of data transmission between CPU and GPU consumed in excuting several little kernels than a big combined kernels, but I'm not sure of this.
Therefore, I want to know
1.Dose clCreateBuffer() means to allocate host memory or GPU memory for buffer ?
2.How to create buffer on GPU memory? Please in detail, thanks very much !
If I can create buffer on GPU memory ,data transmissioin time will consumed little and I can use several kernels rather than one combined as well.