Archives Discussions

aj_guillon · ‎10-24-2010

I worked with a friend to try out my OpenCL project on his Windows machine with Visual Studio, so that I can try the Stream Analyzer. The project compiles fine... runs... but when I try to do any profiling on it, my program crashes due to an error inside one of the kernels. This is my own error code that is generated, but it looks as if profiling the code has caused the results to change. It works fine under normal execution, but with profiling it dies. What could be causing this? There are no race-conditions in my code or anything like that. However, if the kernel is run more than once my program would fail... is the profiler perhaps causing multiple executions of the same kernel without "rolling back" changes to global memory?

I'm using the latest SDK, and latest catalyst drivers on Win XP SP3.

ryta1203 · ‎10-25-2010

Could you post your code?

Plus, this should probably be moved to the OpenCL forum.

aj_guillon · ‎10-25-2010

Thanks for the response, I thought this was the appropriate forum for discussions of tools.

Unfortunately I can't post the code, because it is a fairly large commercial product. I know this doesn't help to debug things, but if there are things I can try I will do so... also, if I can understand how the profiler will interact with my own code perhaps I can find what the issue is. Is each kernel guaranteed to be executed only once?

bpurnomo · ‎10-25-2010

Thank you for using our tool, ATI Stream Profiler. This is the correct forum for discussion about GPU tools. The tool does perform multiple executions per kernel whenever necessary to collect all the specified performance counters. Prior to each kernel execution, buffers that are created with both read and write flag should be rolled back (the tool saves the original buffers). Have you specified these buffers with the correct flags? If possible, please send us (gputools.support@amd.com) a test case in case there is a problem with the tool that we can fix.

aj_guillon · ‎10-25-2010

I have emailed my kernel to the supplied address. For those who come across this thread without access to the code, it basically just tries some atomic operations in a loop until they succeed.

aj_guillon · ‎10-28-2010

Any ideas? Anything I can do to help?

bpurnomo · ‎10-29-2010

Please send us the complete test case.

aj_guillon · ‎10-29-2010

What specifically do you want in a "complete test case"? I cannot send the entire application, and building a demo application that just exhibits this problem also could be very complicated, due to dependencies... I am willing to do this, but is there anything else we can attempt or try before I do that? Perhaps some way for me to send you debugging information, logs, or a trace ?

Also, if I could tell the profiler "I don't care about these particular kernel launches"... I would probably be able to proceed... since the kernels that break the profiler are only constructing data structures for the model to run... i.e. they are run once, quickly, then never again.

As I mentioned, there are many atomic operations in the code... has the profiler been tested with atomic support?

Thanks.

bpurnomo · ‎10-29-2010

Yes, the profiler should work with atomic ops.

Can you please post all source lines corresponding to the clCreateBuffer/Image calls for all the buffer objects used by the kernel?

aj_guillon · ‎10-29-2010

I have a factory that builds the objects... here is what it does:

clCreateBuffer(q._context, flags, size, 0, &error_code);

The flags passed in are: CL_MEM_READ_WRITE if I want to create a read/write array, or CL_MEM_READ_ONLY for a read-only array. I changed the code to use only CL_MEM_READ_WRITE though to see if this had an effect, and it did not.

aj_guillon · ‎10-30-2010

Okay, I think I figured it out. I spent tonight doing a comment-binary-search to try to find what line in my kernel is causing it to crash... the problem is atomics after all. You can reproduce the problem very easily. Write a kernel, and pass __global unsigned int* count as an argument. Initialize the parameter to zero before you pass it into a kernel.

Now inside the kernel just do atom_inc(count) or atom_add(count, 1). You will find that the total sum is greater than the number of global work items by a factor of 4, which suggests to me that the tool runs each kernel 4 times. None of the kernels I care to profile use atomics... but I need to get past this kernel to continue my work. I'm going to try to emulate an atom_add using atom_cmpxchg and see if this fixes it.

bpurnomo · ‎10-30-2010

I don't think the problem is with atomics (I tried with a simple program with atomics and the Histogram SDK sample, both work with the profiler): as long as the kernel arguments and buffers are setup properly (with both READ and WRITE flag), the profiler will restore the read write buffers prior to introducing additional kernel dispatches indiscriminantly. The problem may be due to the way you setup the program that perhaps the profiler hasn't accounted yet (or maybe there is a bug in the application). To rule out the atomics issue, you can try modifying your kernel to read a location in the buffer, add 1, and write back the result to the buffer. I'd guess that you would have the same issue with this scenario in your application.

This is hard to debug without the source application. If you can isolate the problem in a simple test case (perhaps by modifying one of the SDK samples), please send it to us so we can take a look.

Archives Discussions

Profiling Changes Results