cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

alexaverbuch
Journeyman III

ERROR: kernel must return void; pointer argument must point to addrSpace __global, __local, or __constant

Hi,

I have recently added a few more parameters to my Kernel and now I get the following error:

/tmp/OCLSr0DUi.cl(17): error: kernel must return void; pointer argument must
          point to addrSpace __global, __local, or __constant
  __kernel void edgeDetectKernel(    __global      uint4 * input,

After adding the new parameters I also added code in the Host to allocate memory, create buffers, set Kernel arguments, etc. I don't know where to look next...

I have had this error once before, but that was when I was (foolishly) trying to pass a pointer2pointer as a paramter.

Does anyone know what could be causing this error?

Thanks in advance,

Alex

__kernel void edgeDetectKernel( __global uint4 * input, __private uint * intermediate, __global uint * output, __global uint * clSobelOpX, __global uint * clSobelOpY, __private uint kernelCount, const uint2 sobelDim, const uint2 inputOutputDim, const uint2 intermediateDim ) { // CODE }

0 Likes
7 Replies
alexaverbuch
Journeyman III

Here is some of the code for creating my cl_mem buffers

inputBuffer = clCreateBuffer( context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, sizeof(cl_uint4) * width * height, input, &status); intermediateBuffer = clCreateBuffer( context, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, sizeof(cl_uint) * intermediateDim.u32[0] * intermediateDim.u32[1], intermediate, &status); outputBuffer = clCreateBuffer( context, CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR, sizeof(cl_uint) * width * height, output, &status);

0 Likes

hhmm OK, so __private can not be a pointer type...

Is there any way to allocate a buffer for a Kernel, and make it private?

It's not possible to allocate memory INSIDE the Kernel, so what other alternative is there for providing a Kernel with it's own private work space (an array)?

0 Likes

Just statically declare it in the body of the kernel like:

__kernel void edgeDetectKernel( ...){

uint   intermediate[37];

}

Be careful though, as this is not an unlimited resource.  It also is multiplied by the Work Group Size.  I know this is a school project, so this is probably fine.  If this were commercial, I think you might want to look hard at your design to see if you can avoid it, especially if it is large.

0 Likes

Hi jcpalmer,

I can't declare it like that as this application performed edge detection on images using Sobel operators.

First I convert the image into grey-scale (black and white), and then I perform the edge detection.

I want each Kernel to perform the matrix convolutions on a "block" of the image, and the "block" size is dictated by the amount of "global threads"/KERNEL_COUNT. I decide KERNEL_COUNT at compile-time, but the "block" size obviously depends on the image size too, so it is calculated at run-time.

When performing the matrix convolutions with the Sobel operators there are data depencies between neighbouring "blocks" of the grey-scale image. To minimize synchronization points and maximize parallelism I wanted to elocate an intermediate buffer (that has a little "extra" area) to each Kernel (privately) and have them all do a bit of reduntant work.

E.g. Sacrifice some memory, to gain parallelism.

My back-up plan is simply to NOT use an "intermediate" buffer and perform all operations on the __global output buffer.

But this means I need a barrier between the grey-scale calculation and the edge-detection calculation. Still parallel... but not pretty in my opinion.

I think the extra memory usage is not so large... as we are talking about (several copies of) a 2D image, nothing more.

Thanks all the same.

Any further suggestions about way that OpenCL was designed to handle these cases would be greatly appreciated.

Alex

0 Likes

If you are looking to do edge detection only, have a look at the simple sobel filter sample, included in Beta3 SDK. 

0 Likes

Originally posted by: omkaranathan If you are looking to do edge detection only, have a look at the simple sobel filter sample, included in Beta3 SDK. 

 

Thanks for the suggestion, I actually took a brief look at it to learn how to pass the memory buffers to my kernel.

I think (although maybe I'm mistaken) that my case is not as simple as I have 2 stages/filters (and was considering a third), as opposed to the "single pass" of the example.

E.g. Colour_Image >--mean()--> Grey-Scale_Image >--convolutions()--> Edges_Image

My preferred solution is to have each Kernel perform these operations on its own block, with no synchronisation. But, to do that I need private storage in the Kernel. More than that, I need the private block to be a bit larger than "necessary" (1-pixel in each direction, per filter) so it can perform some reduntant processing to eliminate the need for synchronizing with "neighbour" Kernels.

Sorry if I'm not making my delema clear... but, it feels like this kind of problem must be a common one, and OpenCL would likely have some mechanism for dealing with what I'm trying to do. Unfortunately I'm a complete noob, so I don't know of it

0 Likes

Here is a way to set local array size at runtime.

Define the local array in kernel as 

int localArray[SIZE];

Append a #define string to the kernel source code from the host side code

#define SIZE N (N being any integer)

Above string can be generated in runtime code based application requirement.

0 Likes