jpsollie

OpenCL atomic_add and atomic_inc not working correctly

Discussion created by jpsollie on Aug 9, 2017
Latest reply on Aug 10, 2017 by dipak

atom_inc(system) does not atomically increase the value of local uint system[0], whereas atom_xchg(system, system[0] + 1) does.

I also saw this behaviour on clover running with LLVM 5.0

pocl 0.14 (which I use on the opteron CPUs) shows no difference, it runs on LLVM 4.0.1

does this look like an LLVM error? or is compiler related?

 

this piece of code:

 

if(!output[14]) output[14] = system[0] + 1;
atom_inc(system);
if(!output[15]) output[15] = system[0];

 

outputs in gdb (this is the same on clover and amdgpu-pro running on LLVM 5.0!):

Breakpoint 1, worker (device_obj=0x609490) at ./engine.c:397

397                 if(answer[3] == 255) {

(gdb) print answer

$1 = {0, 0, 0, 0, 255, 276, 340, 804850955, 40962, 0, 0, 0, 0, 0, 1, 64}

(gdb) print answer[14]

$2 = 1

(gdb) print answer[15]

$3 = 64

 

output on pocl:

Breakpoint 1, worker (device_obj=0x609490) at ./engine.c:397

397                 if(answer[3] == 255) {

(gdb) print answer[14]

$1 = 1

(gdb) print answer[15]

$2 = 1

(gdb)

 

system details when running amdgpu-pro (also view the clinfo.txt)

Linux 4.10.17

GCC 7.1.0

LLVM/Clang 5.0.0

amdgpu-pro 17.30

pocl 0.14

 

experiment3.tgz contains the source, I inserted a debug function which dumps the private variables of workitem(0,0,0) to the output buffer. feel free to ask if you need it:

dump_global_output(const uchar* array, const uchar* array2, const int outputoffset, global uint* output)

takes the 4 first bytes from array and array2, and dumps mark(oxff), address(1), address(2) and content(1) and content(2) to output[offset]

As the output buffer is currently only 16 ints wide, and the program by itself needs output[0-3] to operate, I think you can only use 4 and 9 as output offsets.

 

dumps.tgz contain the CLOVER dump files (llvm, assembly) to show what happens when changing atom_xchg to atom_add or atom_inc. I do not know how to generate this on amdgpu-pro

 

to run the program, you need a linux system with pthreads and opencl installed.

To view the debug output, you need gdb.  follow these commands:

gdb ./a.out

break 397

print answer

 

good luck!

 

Message was edited by: janpieter sollie

Attachments

Outcomes