Archives Discussions

laobrasuca · ‎09-22-2010

how to pass it as argument to a kernel

Hi all,

i have a collection of cl_mem buffers which i want to make part of a single kernel execution. what i've done so far was making an array of cl_mem:

cl_mem ArrayOfBuffers = new cl_mem[NumberOfBuffers];

(and create the mem from arrays of floats)

then, i set it as kernel argument like:

clSetKernelArg(kernel, 0, sizeof(cl_mem*), (void *)&ArrayOfBuffers)

However i've a problem when getting it into the kernel, where I have:

__kernel void kernel_job(__global float** Array)

but the compiler crashes with error: kernel arguments can't be declared with types bool/half/pointer-to-pointer. How can i "receive" it into the kernel?

I really want to avoid running one kernel per buffer. The problem is that i have hundreds of buffers, which would mean hundreds of kernel calls, thus lots of overhead.

n0thing · ‎09-22-2010

You should do someting like this -

cl_mem *ArrayOfBuffers = new cl_mem[NumberOfBuffers];

And set the arguments like -

for(int i = 0; i < NumberOfBuffers; i++)

clSetKernelArg(kernel, i, sizeof(cl_mem), (void *)&ArrayOfBuffers)

And inside your kernel -

__kernel void kernel_job(__global float* Array0, __global float* Array1, ...)

This is the only way because pointer to pointer is unsupported in OpenCL.

n0thing · ‎09-22-2010

EDIT to the above post -

clSetKernelArg(kernel, i, sizeof(cl_mem), (void *)&(ArrayOfBuffers + i))

laobrasuca · ‎09-22-2010

Originally posted by: n0thing EDIT to the above post -

clSetKernelArg(kernel, i, sizeof(cl_mem), (void *)&(ArrayOfBuffers + i))

thx for the fast reply!

but the problem is that i can reach the maximum number of arguments. Btw, what the CL_DEVICE_MAX_PARAMETER_SIZE means? it says "Max size in bytes of the arguments that can be passed to a kernel" but i don't really understand how to compute the max number of arguments from this.

n0thing · ‎09-22-2010

ITs the limit on total size in bytes of all of your arguments to a kernel. If all of your arguments are float* then each is of 4 bytes.

Currently 1024 bytes is the maximum limit on AMD GPUs so that gives you a maximum of 1024/4 = 128 arguments. So in your case you can use 128 buffers in 1 kernel.

I think its better to pack 4 buffers into 1 buffer by using float4 buffers, that will give you better performance in case of reads/writes.

Also using more arguments will also increase the time of your kernel invocation but thats a relative issue compared to how much time our kernel takes.

Archives Discussions

array of buffer objects