I am a CUDA developer shifting to OpenCL. I am facing lots of difficulties in understanding which of the below mentioned features are also available in OpenCl, just like in CUDA.
- Overlapping Kernel Execution with some host function
- Overlapping Multiple Kernel Executions
- Overlapping Kernel Execution with CPU-GPU or GPU-CPU memcpy
- Overlapping CPU-GPU memcpy with GPU-CPU memcpy
- Copying data from Host memory to device memory Or opposite, without involving CPU/GPu, i.e DMA
- Copying data from one GPU to another GPU directly, just like GPUDirect
- Disabling certain no. of cores in GPU
- Recursion, and if yes, then what is the maximum depth
If available, are these overlapping operations Concurrently executed or executed in parallel(as the GPU is having multiple cores)?
I shall greatly appreciate response on this If you can just give small pointers I will start exploring the details.