Archives Discussions

alexaverbuch · ‎09-27-2009

Hi,

I have recently added a few more parameters to my Kernel and now I get the following error:

/tmp/OCLSr0DUi.cl(17): error: kernel must return void; pointer argument must
point to addrSpace __global, __local, or __constant
__kernel void edgeDetectKernel( __global uint4 * input,

After adding the new parameters I also added code in the Host to allocate memory, create buffers, set Kernel arguments, etc. I don't know where to look next...

I have had this error once before, but that was when I was (foolishly) trying to pass a pointer2pointer as a paramter.

Does anyone know what could be causing this error?

Thanks in advance,

Alex

__kernel void edgeDetectKernel( __global uint4 * input, __private uint * intermediate, __global uint * output, __global uint * clSobelOpX, __global uint * clSobelOpY, __private uint kernelCount, const uint2 sobelDim, const uint2 inputOutputDim, const uint2 intermediateDim ) { // CODE }

alexaverbuch · ‎09-27-2009

Here is some of the code for creating my cl_mem buffers

inputBuffer = clCreateBuffer( context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, sizeof(cl_uint4) * width * height, input, &status); intermediateBuffer = clCreateBuffer( context, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, sizeof(cl_uint) * intermediateDim.u32[0] * intermediateDim.u32[1], intermediate, &status); outputBuffer = clCreateBuffer( context, CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR, sizeof(cl_uint) * width * height, output, &status);

alexaverbuch · ‎09-27-2009

hhmm OK, so __private can not be a pointer type...

Is there any way to allocate a buffer for a Kernel, and make it private?

It's not possible to allocate memory INSIDE the Kernel, so what other alternative is there for providing a Kernel with it's own private work space (an array)?

jcpalmer · ‎09-27-2009

Just statically declare it in the body of the kernel like:

__kernel void edgeDetectKernel( ...){

uint intermediate[37];

}

Be careful though, as this is not an unlimited resource. It also is multiplied by the Work Group Size. I know this is a school project, so this is probably fine. If this were commercial, I think you might want to look hard at your design to see if you can avoid it, especially if it is large.

alexaverbuch · ‎09-27-2009

Hi jcpalmer,

I can't declare it like that as this application performed edge detection on images using Sobel operators.

First I convert the image into grey-scale (black and white), and then I perform the edge detection.

I want each Kernel to perform the matrix convolutions on a "block" of the image, and the "block" size is dictated by the amount of "global threads"/KERNEL_COUNT. I decide KERNEL_COUNT at compile-time, but the "block" size obviously depends on the image size too, so it is calculated at run-time.

When performing the matrix convolutions with the Sobel operators there are data depencies between neighbouring "blocks" of the grey-scale image. To minimize synchronization points and maximize parallelism I wanted to elocate an intermediate buffer (that has a little "extra" area) to each Kernel (privately) and have them all do a bit of reduntant work.

E.g. Sacrifice some memory, to gain parallelism.

My back-up plan is simply to NOT use an "intermediate" buffer and perform all operations on the __global output buffer.

But this means I need a barrier between the grey-scale calculation and the edge-detection calculation. Still parallel... but not pretty in my opinion.

I think the extra memory usage is not so large... as we are talking about (several copies of) a 2D image, nothing more.

Thanks all the same.

Any further suggestions about way that OpenCL was designed to handle these cases would be greatly appreciated.

Alex

omkaranathan · ‎09-27-2009

If you are looking to do edge detection only, have a look at the simple sobel filter sample, included in Beta3 SDK.

alexaverbuch · ‎09-27-2009

Originally posted by: omkaranathan If you are looking to do edge detection only, have a look at the simple sobel filter sample, included in Beta3 SDK.

Thanks for the suggestion, I actually took a brief look at it to learn how to pass the memory buffers to my kernel.

I think (although maybe I'm mistaken) that my case is not as simple as I have 2 stages/filters (and was considering a third), as opposed to the "single pass" of the example.

E.g. Colour_Image >--mean()--> Grey-Scale_Image >--convolutions()--> Edges_Image

My preferred solution is to have each Kernel perform these operations on its own block, with no synchronisation. But, to do that I need private storage in the Kernel. More than that, I need the private block to be a bit larger than "necessary" (1-pixel in each direction, per filter) so it can perform some reduntant processing to eliminate the need for synchronizing with "neighbour" Kernels.

Sorry if I'm not making my delema clear... but, it feels like this kind of problem must be a common one, and OpenCL would likely have some mechanism for dealing with what I'm trying to do. Unfortunately I'm a complete noob, so I don't know of it

omkaranathan · ‎09-29-2009

Here is a way to set local array size at runtime.

Define the local array in kernel as

int localArray[SIZE];

Append a #define string to the kernel source code from the host side code

#define SIZE N (N being any integer)

Above string can be generated in runtime code based application requirement.

Archives Discussions

ERROR: kernel must return void; pointer argument must point to addrSpace __global, __local, or __constant

ERROR: kernel must return void; pointer argument must point to addrSpace global, local, or __constant