I want to realize matrix caculation and every row of this matrix must wait for upper row's value.So i try to use one kernel to caculate all members of one row
Yes, there is a limit for queue size and it is implementation defined. The max. size of the device queue for a particular device can be queried using clGetDeviceInfo with param CL_DEVICE_QUEUE_ON_ DEVICE_MAX_SIZE. There are few other related size limits that can be queried using params: CL_DEVICE_MAX_ON_DEVICE_QUEUES, CL_DEVICE_MAX_ON_DEVICE_EVENTS, CL_DEVICE_QUEUE_ON_ DEVICE_PREFERRED_SIZE. If the queue becomes full during the enqueue_kernel calls, it returns an error CLK_DEVICE_QUEUE_FULL.
One can set the size of the device queue using clCreateCommandQueueWithProperties with param CL_QUEUE_SIZE.
For example, clinfo prints information about the above parameters as below:
Max on device events: 1024Queue on device max size: 524288Max on device queues: 1Queue on device preferred size: 262144
Max on device events: 1024
Queue on device max size: 524288
Max on device queues: 1
Queue on device preferred size: 262144
Using event list to keep sequential execution of kernels in host side.
My question is:
How many kernels can i enqueue into a host-side/device-side command queue? Is there any limitations?
I use clEnqueueNDRangeKernel(cmdqueue,...) to enqueue 1170 kernels(the matrix is 1170*1256) and each kernel has an event.The program running with no error reports. At the same time,clinfo prints the same result as you mentioned.
It seems no limitation in host-side cmdqueue,is that right?
I will test this program using device-side cmdqueue later.
Actually the runtime maintains the host-side queue and can use the system resources as necessary to hold the large number of commands. Though there is no explicitly specified size limit for host-side queue in the OpenCL spec, however it is assumed to be large enough and depends on how that particular runtime manages it.
Retrieving data ...