cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

n_treutner
Journeyman III

Problem after change from 2.2 to 2.3

Kernel produces wrong results

Hi There,

I've only reecntly switched from 2.2 to 2.3 and now parts of my code, that worked fine in 2.2 don't work anymore. i narrowed the problem (or at least one part of it) down to this kernel (attached), by checking the Buffers with the GDebugger.

What the kernel does, is: it compares to images (inputimage1 and 2) pixelwise. i initialize as many work-units as there are pixels (x, y) and then compare them over a window (X and Y), moving to the left of the image (with ww). the results for each comparison (so, variation of ww) are saved inside a 3d array, that actually is a normal array (dsi).

It appears, that the results, that are saved via

dsi[(x + y * width) * w_range + w_dsi] = sum0.w;

are the same inside each kernel (so the work-unit (x,y) writes 231678 to each field of the dsi, that it is supposed to write to), while they should change, since i change ww (via the for loop).

so i tried to find any changes in the handling of images and/or loops, but i couldn't find any relevant stuff.

i wonder if you guys have any idea, what is wrong.

__kernel void ssdkernel(__read_only image2d_t inputImage1, __read_only image2d_t inputImage2, __global float * dsi, __constant int * params, sampler_t imageSampler) { int x = get_global_id(0); int y = get_global_id(1); int width = get_global_size(0); int height = get_global_size(1); int k = (params[0] - 1)/2; int w = params[1]; int w_min = params[2]; int stepping = params[3]; int w_range = (w - w_min) / stepping; float4 diff0, sum0; //iterate over all disparities for (int ww = w_min, w_dsi = 0; ww < w; ww += stepping, w_dsi++) { diff0 = 0.0f; sum0 = 0.0f; //match the windows for(int Y = -k; Y <= k; Y=Y+1) { for(int X = -k; X <= k; X=X+1) { diff0 = read_imagef(inputImage1, imageSampler, (int2)(x + X, y + Y)) - read_imagef(inputImage2, imageSampler, (int2)(x + X - ww, y + Y)); sum0 += diff0 * diff0; } } sum0.w = sqrt(sum0.x * sum0.x + sum0.y * sum0.y + sum0.z * sum0.z); //compute the euclidian distance dsi[(x + y * width) * w_range + w_dsi] = sum0.w; } }

Tags (1)
0 Likes
20 Replies
n_treutner
Journeyman III

Problem after change from 2.2 to 2.3

Hi.

i still didn't have any success in finding the error.

i wonder, if some unrolling might be the cause of the problem, although i don't set any pragma unroll-flag. do you have any information about changes in this area? is there a detailled list of the changes?

regards,

niklas

0 Likes
Jawed
Adept II

Problem after change from 2.2 to 2.3

I notice you are using the NDRange dimensions to determine the image size.

OpenCL's global domain size must be an integer multiple of the local workgroup size. So if your local workgroup is 8x8 then the global size must be a multiple of that. If these dimensions don't match the dimensions of the image or your dsi buffer (e.g. dsi is too small), then you could have problems.

Perhaps the SDK change has affected the workgroup size?

Another thing that might be worth trying is defining the sampler within the kernel, rather than passing it in as a parameter.

0 Likes
n_treutner
Journeyman III

Problem after change from 2.2 to 2.3

Jawed, thanks for your answer.

The global size is a multiple of the local workgroup size. also, i've tried other kernels, that work on the same input with the same images and in the same workgroup size, and these kernels work.

i had the sampler defined within the kernel, but as i tested the code on an nvidia-system, it didn't work until we passed the sampler as a parameter.

also, as i said above, other kernels do access the images correctly. and this kernels reads values from the images, too. it's just, that the results, that are written to the dsi-var are all the same inside each kernel (so they vary from kernel to kernel).

0 Likes
Jawed
Adept II

Problem after change from 2.2 to 2.3

Well in this situation I get out the hammer and sprinkle the code with literals instead of computed addresses and use a very small image. Or write the computed indices into the buffer, instead of the data.

I'm about to post a thread where I'm doing something very similar to your code (3 nested loops iterating windows over two images). It won't help you, but you'll at least see that I'm not having grief with loop indices.

0 Likes
n_treutner
Journeyman III

Problem after change from 2.2 to 2.3

good idea. i did, as you suggested and it turns out, that the computation and/or the adress-computation of the images is the problem. so it's this line:

diff0 = read_imagef(inputImage1, imageSampler, (int2)(x + X, y + Y)) - read_imagef(inputImage2, imageSampler, (int2)(x + X - ww, y + Y));

I found this out by assigning specific values to the dsi in the last line, which turned out to be fine.

What seems to be the problem is the variation indicated by ww. i compare several regions (the region dimensions are indicated by X and Y), apart by ww pixels, to each other. Apparently, for each iteration, when ww is increased, the same set of pixels are compared, although the region should have moved ww pixels to the left.

0 Likes
himanshu_gautam
Grandmaster

Problem after change from 2.2 to 2.3

Can you specify the typical values you specify in the params array. And you local and global ndrange.

You can send a test case via helpdesk http://developer.amd.com/support/Pages/default.aspx.

0 Likes
Jawed
Adept II

Problem after change from 2.2 to 2.3

The next thing I would try is:

if (x != 20 && y != 20) return;

Just after

    int x = get_global_id(0);
     int y = get_global_id(1);

So this will run the kernel for a single pixel. See if the result for the single pixel is correct. 

I chose 20 on the basis that there will be some pixels to the left. Choose anything you think is reasonable

0 Likes
n_treutner
Journeyman III

Problem after change from 2.2 to 2.3

himanshu.guatam,

typical values for the params are

1 < k < 10

w = 160

w_min = 40

stepping 1, 2 or 4

that makes w_range usually 20 and so the dsi-array is width * height * dsiSize * dsi_range, so: 640 * 480 * sizeof(cl_float) * 20

the global ndrange is 640 , 480, the local ndrange 16, 16

 

Jawed, i'll try this as soon, as i get back home. but i don't think it will have different results, since the error is within one single kernel, not between kernels.

also, i start to think, that the sampler might be the problem. it's defined as

imageSampler = clCreateSampler(context, false, CL_ADDRESS_CLAMP,
                                   CL_FILTER_NEAREST, &status);

was anything changed for the image-adressing?

0 Likes
Jawed
Adept II

Problem after change from 2.2 to 2.3

I use

 const sampler_t plane = CLK_NORMALIZED_COORDS_FALSE |
       CLK_ADDRESS_CLAMP |
       CLK_FILTER_NEAREST;

inside my kernel, rather than creating the sampler on the host and passing it as a parameter. You're apparently setting your sampler the same way.

Very strange.

0 Likes