Archives Discussions

n_treutner · ‎08-06-2010

Hi everybody,

i'm having a rather strange, minor problem. i'm afraid, since i don't really know where it comes from and my project is rather big, i can't provide you with a runnable example. i'll try to explain, maybe you have an idea.

i'm using several Buffers and Kernels in my project. among these were two unused (never even passed to the gpu as an argument for a kernel) temporary Buffers i wanted to delete in order to free up memory. The peculiar thing is, that my results are now different (= wrong). i am working with images and now i have "stripes" of incorrect data along my images. so, not every work-item is affected.

my first guess was, that i am accessing my regular Buffers in an incorrect way, which only works, as long as my unused Buffers provide storage for the reading-and-writing operations, performed in otherwise undefined memory-segments. if you agree, that this might be true: is there any smart way to find these errors in my program.

Thanks for your thoughts

Curiouscat · ‎08-06-2010

Fill unused buffers with 0s (or, even better, some known data pattern), launch suspicious kernel, check if the content of the unused buffers has changed? At least that will tell you which kernel(s) to debug.

n_treutner · ‎08-06-2010

Thanks for your reply.

I thought about that, but I wasn't sure, how to analyze the data. do you know a way to "debug" my Buffers? as far as i know, i'd have to manually save them into an external file on the Host.

but maybe i'm wrong? i once tried the opencl-gDEBugger, but unfortunately, since they sent me another beta-licence, it stopped working and not even the gdebugger-staff could find a solution. any other ideas about a convenient way to check the buffers? if not, i'll have to log them into some text-files, i think. (which isn't too bad, i know.)

jeff_golds · ‎08-06-2010

If you fill the buffers with known data, you can just do clEnqueueMapBuffer() later and see if the contents are still correct.

Jeff

n_treutner · ‎08-15-2010

just a quick update on this one (if anyone is interested):

the problem is gone, since i installed the new SDK and drivers.

n_treutner · ‎08-23-2010

i might have been too quick here: apparently the problem has just moved to another region.

now, i have some kernels (and even parts in kernels), that are not always needed for the calculation. if i don't use them (so, if i don't call them with a clEnqueueNDRangeKernel), some parts of result are incorrect: there's just no result for those parts, where there should be at least a default-value. these regions are always the same.

i tried removing all code from the kernels (so the kernel call should have made no difference at all, since the Buffers were not written to), still with the same result.

my first guess was, that some of the kernels, that are processed before the suspicious kernels hadn't finished all their work-items, but i check all my kernels with clWaitForEvents and some clFinish's on crucial points.

do you have any ideas, what might be going on? it's not that critical, since i can just use my unnecessary kernels to obtain correct results, but it kind of drives me crazy...

thanks a lot, i know it's difficult, if you can't test it yourself.

rick_weber · ‎08-23-2010

I agree with Curious Cat in that filling the buffers with a sentinal value should help you debug this. It may be the case that you're getting the correct answers from the previous launch of your application (memory gets mapped the same way every time when you call malloc for example). By initializing the memory, you know you're not seeing results from a previous run. Once you have fixed your problem, you can remove the initialization, as it will just slow down your code.

n_treutner · ‎08-24-2010

thanks for your reply, rick.weber

i have actually removed the Buffers already, since the Problem was gone, since 2.2.

But the new problem is not depending from the Buffers, yet I think that it is related.

as i wrote, the strange behaviour occurs, depending on the use of (even blank) kernels. and i believe, that it could be many other things, too.

i further believe, that some workgroups fail to execute properly, since it's usually connected patches of my data, that are wrong, and i can even alternate between these patches during runtime, by enabling the kernels of disabling them. so: kernels on: patch a correct, patch b incorrect. kernels off: patch a incorrect, patch b correct. (even with no code at all in the kernel).

could it be, that there is some sort of limit to private memory, execution time or anything else, that could cause this?

cjang · ‎08-24-2010

I have had similar experiences (and continue to have them). Also, I have seen stuff like this on both ATI and NVIDIA. I know that in some cases it is SDK or driver related. But I also wonder if I am doing something wrong (programmatic race condition).

First, some constructive advice from my experience. Sentinel values from a test pattern are useful but not enough. Random data is better and can be compared against a correct but slow reference calculation on the CPU. But even this is not enough as results can vary between runs. So you need multiple trials to have confidence stuff is working. A given calculation may not be completely deterministic.

Another general thought is that when you need to do something crazy and completely unrelated to make your code work, like adding print statements, etc, that usually means something is still very wrong. It's easy to formulate a plausible theory as to why this might work (e.g. doing X causes Y to be serialized). But every time I've thought I had a workaround like this, it eventually stopped working. Really, it was never working at all. It was just a fluke.

n_treutner · ‎08-24-2010

cjang, it's good to know, that i'm not alone.

and yes, i do have to add these random statements (like if-clauses, that will never processed, though a "if false" does not work here).

at least, as i stated before, some of these strange things seemed to disappear, when 2.2 came out, though i might just have changed some other parts, that affected the problem.

and, since i work with images, i can literally see, when work-items work correctly, and when some don't. it's just a bit frustrating, when you don't know, what you're doing wrong

n_treutner · ‎08-30-2010

i'm still trying to find out, what's wrong.

some results in some specific "regions" of work-items appear to have slightly changed results, which makes even less sense to me, than zero-ed results.

did any of the AMD-staff have similar issues or does the dev-team know, that there seems to be a problem? i thought about race-condition for a while, but that does actually not make sense to me. maybe memory? i have no idea.

genaganna · ‎08-30-2010

Originally posted by: n.treutner i'm still trying to find out, what's wrong.

some results in some specific "regions" of work-items appear to have slightly changed results, which makes even less sense to me, than zero-ed results.

did any of the AMD-staff have similar issues or does the dev-team know, that there seems to be a problem? i thought about race-condition for a while, but that does actually not make sense to me. maybe memory? i have no idea.

We do face such things but always problem was synchronization or race codition. How big your code? is it possible to send your code to streamdeveloper@amd.com?

n_treutner · ‎08-30-2010

ok, good to know. i'll look into it again.

i could send you the OpenCL-code, but the C-code (for the "host) might be a problem, since it includes quite a lot of other libraries and might take a couple of hours to set up...

does the raw OpenCL-code help, so you could just look into it, without having the chance to test it? it might be a pain to read, since i've played around a lot in order to find the error myself.

thanks for your quick response!

genaganna · ‎08-30-2010

Originally posted by: n.treutner ok, good to know. i'll look into it again.

i could send you the OpenCL-code, but the C-code (for the "host) might be a problem, since it includes quite a lot of other libraries and might take a couple of hours to set up...

does the raw OpenCL-code help, so you could just look into it, without having the chance to test it? it might be a pain to read, since i've played around a lot in order to find the error myself.

thanks for your quick response!

Please send whole code so that we have both option we will use whichever better.

Archives Discussions

unused Buffers needed for correct result