cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

rick_weber
Adept II

Weird bug causes implosion of entire universe

When running my application, I get the following error:

../../../cpucommand.cpp:260: ShouldNotReachHere()

Here is the stack trace:
#0  0x00007ffff6b8cba5 in raise () from /lib/libc.so.6
#1  0x00007ffff6b906b0 in abort () from /lib/libc.so.6
#2  0x00007ffff46ef919 in ?? ()
   from /nfs/sw/opencl/ati/current/lib/x86_64/libamdocl64.so
#3  0x00007ffff469ab81 in ?? ()
   from /nfs/sw/opencl/ati/current/lib/x86_64/libamdocl64.so
#4  0x00007ffff469b493 in ?? ()
   from /nfs/sw/opencl/ati/current/lib/x86_64/libamdocl64.so
#5  0x00007ffff46ef08c in ?? ()
   from /nfs/sw/opencl/ati/current/lib/x86_64/libamdocl64.so
#6  0x00007ffff46ed19d in ?? ()
   from /nfs/sw/opencl/ati/current/lib/x86_64/libamdocl64.so
#7  0x00007ffff6943971 in start_thread () from /lib/libpthread.so.0
#8  0x00007ffff6c3f92d in clone () from /lib/libc.so.6
#9  0x0000000000000000 in ?? ()
A repro is kinda hard to supply because I don't know what causes it...


0 Likes
5 Replies
himanshu_gautam
Grandmaster

Please provide your system configuration:CPU,GPU,SDK,Driver,OS.

Have you tried debugging your code using GDB?

0 Likes

Originally posted by: himanshu.gautam Please provide your system configuration:CPU,GPU,SDK,Driver,OS.

 

Have you tried debugging your code using GDB?

 

I'm using an Intel processor with SDK 2.4, which I know is an unsupported configuration, but I'm pretty sure the error will appear on an AMD processor as well. I'm using Ubuntu 10.10.

Using GDB and commenting stuff out, I found this error is caused by a single barrier() call after doing a massive loop. In fact, if I comment out any of the barriers in my main loop, the code runs to completion (though, not necessarily correctly). I'm working on a repro to try to isolate the cause.

0 Likes

rick.weber,
Without the code I can't say for sure, but this error means that your barrier is uneven. Basically that means that not every thread in the work-group is hitting the barrier which is required by the OpenCL spec.
0 Likes

I'd buy that. The looping code is very complicated with lots of branching and such. I tried to ensure that each thread enters every loop and conditionally executes real code while hitting every barrier, but I think I might have missed a few. Thanks Micah!

 

In 2.5, would it be possible to detect this problem and report it less ambiguously?

0 Likes

So, this is definitely the issue. I replaced all my barriers with __BARRIER(type) where:

#define __BARRIER(type) \
  printf("\t__BARRIER hit by thread %d\n", get_local_id(0));\
  barrier(type);

The expected printed output should be
__BARRIER hit by thread 0
__BARRIER hit by thread 1
__BARRIER hit by thread ...
__BARRIER hit by thread 63
for each barrier.
I mostly see this, but I also see some instances of:
__BARRIER hit by thread 0
__BARRIER hit by thread 32
__BARRIER hit by thread 0
__BARRIER hit by thread 32
which are incorrect.
0 Likes