Hi Everyone,
i can't quite get my head around a problem i have, when trying to implement the backward pass of a dynamic programming algorithm.
I attached the code, that won't work. i added the backlinks-initialization in order to make sure, i have valid entries to add to ww (this would be computed beforehand in the 1st DP-pass).
The output-line is just for testing purposes. it only works, if i remove the "ww = BackLinks[Y * w + ww];" line. so i guess, i can't really change the ww-variable inside a loop. is that correct? why?
do you have any workaround - or probably a good idea, how to tackle the whole dynamic-programming issue in opencl? (since the "classic" way, is not a perfect parallelization-candidate...)
Thanks for your help
for (int Y = 1; Y < height; Y++) { for (int ww = 0; ww < w; ww++) { BackLinks[Y * w + ww] = 0; } } // Dynamic Programming (scanline): backward pass --- int ww = 0; for (int Y = height-1; Y > 1; Y--) { output[(X + Y * width)].x = 0.0f; ww = BackLinks[Y * w + ww]; }
What happens to ww after that loop? How is it then being used? What do you mean by "it only works...", what goes wrong if you remove the output line?
Thanks for your reply.
ww isn't used after the loop anymore. The output line originally was output[(X + Y * width)].x = ww;
but i changed the output to "0.0f", in order to rule out any problems there.
sorry if i was confusing you. what i was trying to explain, was:
output is later converted into an image, so the the output-line should result in a 0.0 red-channel ( and once that works: into a value of "ww"), if the loop is processed. And this only happens, if i remove the ww = BackLinks[Y * w + ww]; line. The way it is now, nothing happens to the output-image.
btw: i don't use the opencl-2dImage yet. it's a float4 i convert into a IplImage later.
i've tried adding the following to a "working" loop:
---
int www = BackLinks[Y * w + ww];
ww = www;
---
the code, usually normally performed inside the loops just stops working. code after this loop works normally.
(for initializing purposes, this is added before the loop
int ww;
int * BackLinks;
for (int Y = 1; Y < height; Y++)
{
for (int ww = 0; ww < w; ww++)
{
BackLinks[Y * w + ww] = 0;
}
}
Looks like you are not initializing the pointers. What kind of error are you facing?
Could you post the source code?
Unfortunately i don't get any error-code. The entire program still works, only code inside the for-loop is not performed (even if it has nothing to do with the variables in question)
i'm not sure about the pointer-initialization though. i've added the entire code of the one kernel. Please let me know, if something is not clear.
Thanks for your effort!
__kernel void dpkernel(__global float * dsi, __global float4 * outputImage1, __constant int * params) { int X = get_global_id(0); int width = params[18]; int height = params[19]; int k = (params[0] - 1)/2; int w = params[1]; float OccCost = 10.0f; float c1, c2, c3, mc; int * BackLinks; int bl; for (int Y = 1; Y < height; Y++) { for (int ww = 0; ww < w; ww++) { BackLinks[Y * w + ww] = 0; } } for (int Y = 1; Y < height; Y++) { for (int ww = 0; ww < min(w, Y); ww++) { int pos = (X + (Y - 1) * width) * w + ww; c1 = dsi[pos + 1] + OccCost; c2 = dsi[pos] + dsi[(X + (Y) * width) * w + ww]; c3 = dsi[pos - 1] + OccCost; mc = c2; bl = 0; if (c1 < mc) { mc = c1; bl = 1; } if (c3 < mc) { mc = c3; bl = -1; } dsi[(X + (Y) * width) * w + ww] = mc; BackLinks[Y * w + ww] = bl; } } for (int Y = 1; Y < height; Y++) { for (int ww = 0; ww < w; ww++) { BackLinks[Y * w + ww] = 0; } } // Dynamic Programming (scanline): backward pass --- int ww = 0; for (int Y = height-1; Y > 1; Y--) { outputImage1[(X + Y * width)].x = BackLinks[Y * w + ww]; int www = BackLinks[Y * w + ww]; ww = www; } }
What is the behavior when running the program in CPU?
omkaranathan, it's the same result for cpu and gpu.
Please post the source code(host & kernel code). A compilable test case would make it easy for us to track down the exact issue.
there's a problem with posting the host code: it is part of a bigger framework i'm trying to speed up with OpenCL, that won't compile with lots of other libraries. i can try to put something together, so you can have a look at the problem.
i have to say, though, that i am not sure if it will still make much sense. i might have to put dummy "here be dragons/image" data in. would this still help you? i hope, i can get it done tomorrow, sorry for the delay.
Hey guys,
sorry for taking so long. I was pretty busy and I only recently updated from 2.0 to 2.1
And apparently, my issue is resolved in 2.1. So, good work there, AMD!
I still don't know, what caused it, but I guess it doesn't matter anymore. But if you have an idea, please let me know, it's always good to know, how things work in the inside
EDIT: Hm, actually I still can't get the whole dynamic programming-thing working...
EDIT2: Finally! I had to story my calculations in a global array, instead of a local one.