Hi all,
i have a collection of cl_mem buffers which i want to make part of a single kernel execution. what i've done so far was making an array of cl_mem:
cl_mem ArrayOfBuffers = new cl_mem[NumberOfBuffers];
(and create the mem from arrays of floats)
then, i set it as kernel argument like:
clSetKernelArg(kernel, 0, sizeof(cl_mem*), (void *)&ArrayOfBuffers)
However i've a problem when getting it into the kernel, where I have:
__kernel void kernel_job(__global float** Array)
but the compiler crashes with error: kernel arguments can't be declared with types bool/half/pointer-to-pointer. How can i "receive" it into the kernel?
I really want to avoid running one kernel per buffer. The problem is that i have hundreds of buffers, which would mean hundreds of kernel calls, thus lots of overhead.
You should do someting like this -
cl_mem *ArrayOfBuffers = new cl_mem[NumberOfBuffers];
And set the arguments like -
for(int i = 0; i < NumberOfBuffers; i++)
clSetKernelArg(kernel, i, sizeof(cl_mem), (void *)&ArrayOfBuffers)
And inside your kernel -
__kernel void kernel_job(__global float* Array0, __global float* Array1, ...)
This is the only way because pointer to pointer is unsupported in OpenCL.
EDIT to the above post -
clSetKernelArg(kernel, i, sizeof(cl_mem), (void *)&(ArrayOfBuffers + i))
Originally posted by: n0thing EDIT to the above post -
clSetKernelArg(kernel, i, sizeof(cl_mem), (void *)&(ArrayOfBuffers + i))
thx for the fast reply!
but the problem is that i can reach the maximum number of arguments. Btw, what the CL_DEVICE_MAX_PARAMETER_SIZE means? it says "Max size in bytes of the arguments that can be passed to a kernel" but i don't really understand how to compute the max number of arguments from this.
ITs the limit on total size in bytes of all of your arguments to a kernel. If all of your arguments are float* then each is of 4 bytes.
Currently 1024 bytes is the maximum limit on AMD GPUs so that gives you a maximum of 1024/4 = 128 arguments. So in your case you can use 128 buffers in 1 kernel.
I think its better to pack 4 buffers into 1 buffer by using float4 buffers, that will give you better performance in case of reads/writes.
Also using more arguments will also increase the time of your kernel invocation but thats a relative issue compared to how much time our kernel takes.