Hi collective brains,
I have a couple of questions about running multiple kernels concurrently on GPUs. I have been surfing through the internet on this issues but there's no clear answer to me. Also I could not find any code example for this. So any help will be really appreciated.
1) how to enable multiple kernel running concurrently? Someones said enable out of order execution on the same queue, some said using multiple queues for multiple kernels. Which is the best way? Any example on this?
2) I have a kernel (says kernel1) process different sets of datas independently, and for each set of data, it has to utilize different amount of shared memory. There a simple way to solve this is set lunch the kernel only once to process the all set of data and utilize the shared memory to the upper bound, however this is inefficient in my case because upperboud and lowerbound are quite largely different and therefore the occupancy is not optimized.
So what I am planning to do is I lunch multiple copy of kernel1 (i.e. kernel11, kernel12, kerner13...) with different local memory utilizations to process the data set1, set2, set3.
kernel11 = clCreateKernel( program_t,"kernel1",&clstatus);
kernel12 = clCreateKernel( program_t,"kernel1",&clstatus);
If I create the kernel on that's way, will OpenCL treat them as different kernels or OpenCl will know them as a same kernel (since their code are the same?)
3) regarding to my problem in 2, can you suggest me a better solution?